Re: How to handle the UUID in Spark 1.3.1
This is related: SPARK-10501 On Fri, Oct 9, 2015 at 7:28 AM, java8964wrote: > Hi, Sparkers: > > In this case, I want to use Spark as an ETL engine to load the data from > Cassandra, and save it into HDFS. > > Here is the environment specified information: > > Spark 1.3.1 > Cassandra 2.1 > HDFS/Hadoop 2.2 > > I am using the Cassandra Spark Connector 1.3.x, which I have no problem to > query the C* data in the Spark. But I have a problem trying to save the > data into HDFS, like below: > > val df = sqlContext.load("org.apache.spark.sql.cassandra", options = Map( > "c_table" -> "table_name", "keyspace" -> "keyspace_name") > df: org.apache.spark.sql.DataFrame = [account_id: bigint, campaign_id: > uuid, business_info_ids: array, closed_date: timestamp, > compliance_hold: boolean, contacts_list_id: uuid, contacts_list_seq: > bigint, currency_type: string, deleted_date: timestamp, discount_info: > map , end_date: timestamp, insert_by: string, insert_time: > timestamp, last_update_by: string, last_update_time: timestamp, name: > string, parent_id: uuid, publish_date: timestamp, share_incentive: > map , start_date: timestamp, version: int] > > scala> df.count > res12: Long = 757704 > > I can also dump the data output suing df.first, without any problem. > > But when I try to save it: > > scala> df.save("hdfs://location", "parquet") > java.lang.RuntimeException: Unsupported datatype UUIDType > at scala.sys.package$.error(package.scala:27) > at > org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$fromDataType$2.apply(ParquetTypes.scala:372) > at > org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$fromDataType$2.apply(ParquetTypes.scala:316) > at scala.Option.getOrElse(Option.scala:120) > at > org.apache.spark.sql.parquet.ParquetTypesConverter$.fromDataType(ParquetTypes.scala:315) > at > org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$4.apply(ParquetTypes.scala:395) > at > org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$4.apply(ParquetTypes.scala:394) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at scala.collection.immutable.List.foreach(List.scala:318) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) > at scala.collection.AbstractTraversable.map(Traversable.scala:105) > at > org.apache.spark.sql.parquet.ParquetTypesConverter$.convertFromAttributes(ParquetTypes.scala:393) > at > org.apache.spark.sql.parquet.ParquetTypesConverter$.writeMetaData(ParquetTypes.scala:440) > at > org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache.prepareMetadata(newParquet.scala:260) > at > org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$6.apply(newParquet.scala:276) > at > org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$6.apply(newParquet.scala:269) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at scala.collection.immutable.List.foreach(List.scala:318) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) > at scala.collection.AbstractTraversable.map(Traversable.scala:105) > at > org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache.refresh(newParquet.scala:269) > at > org.apache.spark.sql.parquet.ParquetRelation2.(newParquet.scala:391) > at > org.apache.spark.sql.parquet.DefaultSource.createRelation(newParquet.scala:98) > at > org.apache.spark.sql.parquet.DefaultSource.createRelation(newParquet.scala:128) > at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:240) > at org.apache.spark.sql.DataFrame.save(DataFrame.scala:1196) > at org.apache.spark.sql.DataFrame.save(DataFrame.scala:1156) > at > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:28) > at > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:33) > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:35) > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:37) > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:39) > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:41) > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:43) > at $iwC$$iwC$$iwC$$iwC$$iwC.(:45) > at $iwC$$iwC$$iwC$$iwC.(:47) > at $iwC$$iwC$$iwC.(:49) > at $iwC$$iwC.(:51) > at $iwC.(:53) > at (:55) > at .(:59) > at .() > at .(:7) > at .() > at $print() > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) > at > org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338) > at
How to handle the UUID in Spark 1.3.1
Hi, Sparkers: In this case, I want to use Spark as an ETL engine to load the data from Cassandra, and save it into HDFS. Here is the environment specified information: Spark 1.3.1Cassandra 2.1HDFS/Hadoop 2.2 I am using the Cassandra Spark Connector 1.3.x, which I have no problem to query the C* data in the Spark. But I have a problem trying to save the data into HDFS, like below: val df = sqlContext.load("org.apache.spark.sql.cassandra", options = Map( "c_table" -> "table_name", "keyspace" -> "keyspace_name")df: org.apache.spark.sql.DataFrame = [account_id: bigint, campaign_id: uuid, business_info_ids: array, closed_date: timestamp, compliance_hold: boolean, contacts_list_id: uuid, contacts_list_seq: bigint, currency_type: string, deleted_date: timestamp, discount_info: map, end_date: timestamp, insert_by: string, insert_time: timestamp, last_update_by: string, last_update_time: timestamp, name: string, parent_id: uuid, publish_date: timestamp, share_incentive: map , start_date: timestamp, version: int] scala> df.countres12: Long = 757704 I can also dump the data output suing df.first, without any problem. But when I try to save it: scala> df.save("hdfs://location", "parquet")java.lang.RuntimeException: Unsupported datatype UUIDType at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$fromDataType$2.apply(ParquetTypes.scala:372) at org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$fromDataType$2.apply(ParquetTypes.scala:316) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.sql.parquet.ParquetTypesConverter$.fromDataType(ParquetTypes.scala:315) at org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$4.apply(ParquetTypes.scala:395) at org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$4.apply(ParquetTypes.scala:394) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.immutable.List.foreach(List.scala:318) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.sql.parquet.ParquetTypesConverter$.convertFromAttributes(ParquetTypes.scala:393) at org.apache.spark.sql.parquet.ParquetTypesConverter$.writeMetaData(ParquetTypes.scala:440) at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache.prepareMetadata(newParquet.scala:260) at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$6.apply(newParquet.scala:276) at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$6.apply(newParquet.scala:269) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.immutable.List.foreach(List.scala:318) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache.refresh(newParquet.scala:269) at org.apache.spark.sql.parquet.ParquetRelation2.(newParquet.scala:391) at org.apache.spark.sql.parquet.DefaultSource.createRelation(newParquet.scala:98) at org.apache.spark.sql.parquet.DefaultSource.createRelation(newParquet.scala:128) at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:240) at org.apache.spark.sql.DataFrame.save(DataFrame.scala:1196)at org.apache.spark.sql.DataFrame.save(DataFrame.scala:1156)at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:28) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:33) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:35) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:37)at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:39) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:41) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:43) at $iwC$$iwC$$iwC$$iwC$$iwC.(:45)at $iwC$$iwC$$iwC$$iwC.(:47) at $iwC$$iwC$$iwC.(:49) at $iwC$$iwC.(:51) at $iwC.(:53)at (:55) at .(:59) at .() at .(:7) at .() at $print()at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) at
RE: How to handle the UUID in Spark 1.3.1
Thanks, Ted. Does this mean I am out of luck for now? If I use HiveContext, and cast the UUID as string, will it work? Yong Date: Fri, 9 Oct 2015 09:09:38 -0700 Subject: Re: How to handle the UUID in Spark 1.3.1 From: yuzhih...@gmail.com To: java8...@hotmail.com CC: user@spark.apache.org This is related:SPARK-10501 On Fri, Oct 9, 2015 at 7:28 AM, java8964wrote: Hi, Sparkers: In this case, I want to use Spark as an ETL engine to load the data from Cassandra, and save it into HDFS. Here is the environment specified information: Spark 1.3.1Cassandra 2.1HDFS/Hadoop 2.2 I am using the Cassandra Spark Connector 1.3.x, which I have no problem to query the C* data in the Spark. But I have a problem trying to save the data into HDFS, like below: val df = sqlContext.load("org.apache.spark.sql.cassandra", options = Map( "c_table" -> "table_name", "keyspace" -> "keyspace_name")df: org.apache.spark.sql.DataFrame = [account_id: bigint, campaign_id: uuid, business_info_ids: array, closed_date: timestamp, compliance_hold: boolean, contacts_list_id: uuid, contacts_list_seq: bigint, currency_type: string, deleted_date: timestamp, discount_info: map , end_date: timestamp, insert_by: string, insert_time: timestamp, last_update_by: string, last_update_time: timestamp, name: string, parent_id: uuid, publish_date: timestamp, share_incentive: map , start_date: timestamp, version: int] scala> df.countres12: Long = 757704 I can also dump the data output suing df.first, without any problem. But when I try to save it: scala> df.save("hdfs://location", "parquet")java.lang.RuntimeException: Unsupported datatype UUIDType at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$fromDataType$2.apply(ParquetTypes.scala:372) at org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$fromDataType$2.apply(ParquetTypes.scala:316) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.sql.parquet.ParquetTypesConverter$.fromDataType(ParquetTypes.scala:315) at org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$4.apply(ParquetTypes.scala:395) at org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$4.apply(ParquetTypes.scala:394) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.immutable.List.foreach(List.scala:318) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.sql.parquet.ParquetTypesConverter$.convertFromAttributes(ParquetTypes.scala:393) at org.apache.spark.sql.parquet.ParquetTypesConverter$.writeMetaData(ParquetTypes.scala:440) at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache.prepareMetadata(newParquet.scala:260) at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$6.apply(newParquet.scala:276) at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$6.apply(newParquet.scala:269) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.immutable.List.foreach(List.scala:318) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache.refresh(newParquet.scala:269) at org.apache.spark.sql.parquet.ParquetRelation2.(newParquet.scala:391) at org.apache.spark.sql.parquet.DefaultSource.createRelation(newParquet.scala:98) at org.apache.spark.sql.parquet.DefaultSource.createRelation(newParquet.scala:128) at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:240) at org.apache.spark.sql.DataFrame.save(DataFrame.scala:1196)at org.apache.spark.sql.DataFrame.save(DataFrame.scala:1156)at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:28) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:33) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:35) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:37)at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:39) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:41) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:43) at $iwC$$iwC$$iwC$$iwC$$iwC.(:45)at $iwC$$iwC$$iwC$$iwC.(:47) at $iwC$$iwC$$iwC.(:49) at $iwC$$iwC.(:51) at $iwC.(:53)at (:55) at .(:59) at .() at .(:7) at .() at $print()at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at
Re: How to handle the UUID in Spark 1.3.1
I guess that should work :-) On Fri, Oct 9, 2015 at 10:46 AM, java8964wrote: > Thanks, Ted. > > Does this mean I am out of luck for now? If I use HiveContext, and cast > the UUID as string, will it work? > > Yong > > -- > Date: Fri, 9 Oct 2015 09:09:38 -0700 > Subject: Re: How to handle the UUID in Spark 1.3.1 > From: yuzhih...@gmail.com > To: java8...@hotmail.com > CC: user@spark.apache.org > > > This is related: > SPARK-10501 > > On Fri, Oct 9, 2015 at 7:28 AM, java8964 wrote: > > Hi, Sparkers: > > In this case, I want to use Spark as an ETL engine to load the data from > Cassandra, and save it into HDFS. > > Here is the environment specified information: > > Spark 1.3.1 > Cassandra 2.1 > HDFS/Hadoop 2.2 > > I am using the Cassandra Spark Connector 1.3.x, which I have no problem to > query the C* data in the Spark. But I have a problem trying to save the > data into HDFS, like below: > > val df = sqlContext.load("org.apache.spark.sql.cassandra", options = Map( > "c_table" -> "table_name", "keyspace" -> "keyspace_name") > df: org.apache.spark.sql.DataFrame = [account_id: bigint, campaign_id: > uuid, business_info_ids: array, closed_date: timestamp, > compliance_hold: boolean, contacts_list_id: uuid, contacts_list_seq: > bigint, currency_type: string, deleted_date: timestamp, discount_info: > map , end_date: timestamp, insert_by: string, insert_time: > timestamp, last_update_by: string, last_update_time: timestamp, name: > string, parent_id: uuid, publish_date: timestamp, share_incentive: > map , start_date: timestamp, version: int] > > scala> df.count > res12: Long = 757704 > > I can also dump the data output suing df.first, without any problem. > > But when I try to save it: > > scala> df.save("hdfs://location", "parquet") > java.lang.RuntimeException: Unsupported datatype UUIDType > at scala.sys.package$.error(package.scala:27) > at > org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$fromDataType$2.apply(ParquetTypes.scala:372) > at > org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$fromDataType$2.apply(ParquetTypes.scala:316) > at scala.Option.getOrElse(Option.scala:120) > at > org.apache.spark.sql.parquet.ParquetTypesConverter$.fromDataType(ParquetTypes.scala:315) > at > org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$4.apply(ParquetTypes.scala:395) > at > org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$4.apply(ParquetTypes.scala:394) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at scala.collection.immutable.List.foreach(List.scala:318) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) > at scala.collection.AbstractTraversable.map(Traversable.scala:105) > at > org.apache.spark.sql.parquet.ParquetTypesConverter$.convertFromAttributes(ParquetTypes.scala:393) > at > org.apache.spark.sql.parquet.ParquetTypesConverter$.writeMetaData(ParquetTypes.scala:440) > at > org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache.prepareMetadata(newParquet.scala:260) > at > org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$6.apply(newParquet.scala:276) > at > org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$6.apply(newParquet.scala:269) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at scala.collection.immutable.List.foreach(List.scala:318) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) > at scala.collection.AbstractTraversable.map(Traversable.scala:105) > at > org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache.refresh(newParquet.scala:269) > at > org.apache.spark.sql.parquet.ParquetRelation2.(newParquet.scala:391) > at > org.apache.spark.sql.parquet.DefaultSource.createRelation(newParquet.scala:98) > at > org.apache.spark.sql.parquet.DefaultSource.createRelation(newParquet.scala:128) > at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:240) > at org.apache.spark.sql.DataFrame.save(DataFrame.scala:1196) > at org.apache.spark.sql.DataFrame.save(DataFrame.scala:1156) > at > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:28) > at > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:33) > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:35) > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:37) > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:39) > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:41) > at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:43) > at $iwC$$iwC$$iwC$$iwC$$iwC.(:45) > at $iwC$$iwC$$iwC$$iwC.(:47) > at $iwC$$iwC$$iwC.(:49) > at $iwC$$iwC.(:51) > at $iwC.(:53) > at (:55) > at .(:59) > at .() > at .(:7) > at .() > at $print() > at