Re: SparkR Supported Types - Please add bigint

2015-08-07 Thread Davies Liu
They are actually the same thing, LongType. `long` is friendly for
developer, `bigint` is friendly for database guy, maybe data
scientists.

On Thu, Jul 23, 2015 at 11:33 PM, Sun, Rui rui@intel.com wrote:
 printSchema calls StructField. buildFormattedString() to output schema 
 information. buildFormattedString() use DataType.typeName as string 
 representation of  the data type.

 LongType. typeName = long
 LongType.simpleString = bigint

 I am not sure about the difference of these two type name representations.

 -Original Message-
 From: Exie [mailto:tfind...@prodevelop.com.au]
 Sent: Friday, July 24, 2015 1:35 PM
 To: user@spark.apache.org
 Subject: Re: SparkR Supported Types - Please add bigint

 Interestingly, after more digging, df.printSchema() in raw spark shows the 
 columns as a long, not a bigint.

 root
  |-- localEventDtTm: timestamp (nullable = true)
  |-- asset: string (nullable = true)
  |-- assetCategory: string (nullable = true)
  |-- assetType: string (nullable = true)
  |-- event: string (nullable = true)
  |-- extras: array (nullable = true)
  ||-- element: struct (containsNull = true)
  |||-- name: string (nullable = true)
  |||-- value: string (nullable = true)
  |-- ipAddress: string (nullable = true)
  |-- memberId: string (nullable = true)
  |-- system: string (nullable = true)
  |-- timestamp: long (nullable = true)
  |-- title: string (nullable = true)
  |-- trackingId: string (nullable = true)
  |-- version: long (nullable = true)

 I'm going to have to keep digging I guess. :(




 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-Supported-Types-Please-add-bigint-tp23975p23978.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional 
 commands, e-mail: user-h...@spark.apache.org


 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



RE: SparkR Supported Types - Please add bigint

2015-07-24 Thread Sun, Rui
printSchema calls StructField. buildFormattedString() to output schema 
information. buildFormattedString() use DataType.typeName as string 
representation of  the data type.

LongType. typeName = long
LongType.simpleString = bigint

I am not sure about the difference of these two type name representations.

-Original Message-
From: Exie [mailto:tfind...@prodevelop.com.au] 
Sent: Friday, July 24, 2015 1:35 PM
To: user@spark.apache.org
Subject: Re: SparkR Supported Types - Please add bigint

Interestingly, after more digging, df.printSchema() in raw spark shows the 
columns as a long, not a bigint.

root
 |-- localEventDtTm: timestamp (nullable = true)
 |-- asset: string (nullable = true)
 |-- assetCategory: string (nullable = true)
 |-- assetType: string (nullable = true)
 |-- event: string (nullable = true)
 |-- extras: array (nullable = true)
 ||-- element: struct (containsNull = true)
 |||-- name: string (nullable = true)
 |||-- value: string (nullable = true)
 |-- ipAddress: string (nullable = true)
 |-- memberId: string (nullable = true)
 |-- system: string (nullable = true)
 |-- timestamp: long (nullable = true)
 |-- title: string (nullable = true)
 |-- trackingId: string (nullable = true)
 |-- version: long (nullable = true)

I'm going to have to keep digging I guess. :(




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-Supported-Types-Please-add-bigint-tp23975p23978.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional 
commands, e-mail: user-h...@spark.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



RE: SparkR Supported Types - Please add bigint

2015-07-23 Thread Sun, Rui
Exie,

Reported your issue: https://issues.apache.org/jira/browse/SPARK-9302

SparkR has support for long(bigint) type in serde. This issue is related to 
support complex Scala types in serde.

-Original Message-
From: Exie [mailto:tfind...@prodevelop.com.au] 
Sent: Friday, July 24, 2015 10:26 AM
To: user@spark.apache.org
Subject: SparkR Supported Types - Please add bigint

Hi Folks,

Using Spark to read in JSON files and detect the schema, it gives me a 
dataframe with a bigint filed. R then fails to import the dataframe as it 
cant convert the type.

 head(mydf)
Error in as.data.frame.default(x[[i]], optional = TRUE) : 
  cannot coerce class jobj to a data.frame

 show(mydf)
DataFrame[localEventDtTm:timestamp, asset:string, assetCategory:string, 
assetType:string, event:string, 
extras:arraystructlt;name:string,value:string, ipAddress:string, 
memberId:string, system:string, timestamp:bigint, title:string, 
trackingId:string, version:bigint]


I believe this is related to:
https://issues.apache.org/jira/browse/SPARK-8840

A sample record in raw JSON looks like this:
{version: 1,event: view,timestamp: 1427846422377,system:
DCDS,asset: 6404476,assetType: myType,assetCategory:
myCategory,extras: [{name: videoSource,value: mySource},{name:
playerType,value: Article},{name: duration,value:
202088}],trackingId: 155629a0-d802-11e4-13ee-6884e43d6000,ipAddress:
165.69.2.4,title: myTitle}

Can someone turn this into a feature request or something for 1.5.0 ?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-Supported-Types-Please-add-bigint-tp23975.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional 
commands, e-mail: user-h...@spark.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: SparkR Supported Types - Please add bigint

2015-07-23 Thread Exie
Interestingly, after more digging, df.printSchema() in raw spark shows the
columns as a long, not a bigint.

root
 |-- localEventDtTm: timestamp (nullable = true)
 |-- asset: string (nullable = true)
 |-- assetCategory: string (nullable = true)
 |-- assetType: string (nullable = true)
 |-- event: string (nullable = true)
 |-- extras: array (nullable = true)
 ||-- element: struct (containsNull = true)
 |||-- name: string (nullable = true)
 |||-- value: string (nullable = true)
 |-- ipAddress: string (nullable = true)
 |-- memberId: string (nullable = true)
 |-- system: string (nullable = true)
 |-- timestamp: long (nullable = true)
 |-- title: string (nullable = true)
 |-- trackingId: string (nullable = true)
 |-- version: long (nullable = true)

I'm going to have to keep digging I guess. :(




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/SparkR-Supported-Types-Please-add-bigint-tp23975p23978.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org