Re: query avro hive table in spark sql
Any idea what causing this error 15/08/28 21:03:03 WARN scheduler.TaskSetManager: Lost task 34.0 in stage 9.0 (TID 20, dtord01hdw0228p.dc.dotomi.net): java.lang.RuntimeException: cannot find field message_campaign_id from [0:error_error_error_error_error_error_error, 1:cannot_determine_schema, 2:check, 3:schema, 4:url, 5:and, 6:literal] at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:410) at org.apache.hadoop.hive.serde2.objectinspector.StandardStructObjectInspector.getStructFieldRef(StandardStructObjectInspector.java:147) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$12.apply(TableReader.scala:278) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$12.apply(TableReader.scala:277) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.sql.hive.HadoopTableReader$.fillObject(TableReader.scala:277) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$4$$anonfun$9.apply(TableReader.scala:194) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$4$$anonfun$9.apply(TableReader.scala:188) at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634) at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) On Thu, Aug 27, 2015 at 12:02 PM, Michael Armbrust mich...@databricks.com wrote: BTY, spark-avro works great for our experience, but still, some non-tech people just want to use as a SQL shell in spark, like HIVE-CLI. To clarify: you can still use the spark-avro library with pure SQL. Just use the CREATE TABLE ... USING com.databricks.spark.avro OPTIONS (path '...') syntax.
RE: query avro hive table in spark sql
What version of the Hive you are using? And do you compile to the right version of Hive when you compiled Spark? BTY, spark-avro works great for our experience, but still, some non-tech people just want to use as a SQL shell in spark, like HIVE-CLI. Yong From: mich...@databricks.com Date: Wed, 26 Aug 2015 17:48:44 -0700 Subject: Re: query avro hive table in spark sql To: gpatc...@gmail.com CC: user@spark.apache.org I'd suggest looking at http://spark-packages.org/package/databricks/spark-avro On Wed, Aug 26, 2015 at 11:32 AM, gpatcham gpatc...@gmail.com wrote: Hi, I'm trying to query hive table which is based on avro in spark SQL and seeing below errors. 15/08/26 17:51:12 WARN avro.AvroSerdeUtils: Encountered AvroSerdeException determining schema. Returning signal schema to indicate problem org.apache.hadoop.hive.serde2.avro.AvroSerdeException: Neither avro.schema.literal nor avro.schema.url specified, can't determine table schema at org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.determineSchemaOrThrowException(AvroSerdeUtils.java:68) at org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.determineSchemaOrReturnErrorSchema(AvroSerdeUtils.java:93) at org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:60) at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:375) at org.apache.hadoop.hive.ql.metadata.Partition.getDeserializer(Partition.java:249) Its not able to determine schema. Hive table is pointing to avro schema using url. I'm stuck and couldn't find more info on this. Any pointers ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/query-avro-hive-table-in-spark-sql-tp24462.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: query avro hive table in spark sql
I was using different build of spark compiled with different version of hive before I error which I see now org.apache.hadoop.hive.serde2.avro.BadSchemaException at org.apache.hadoop.hive.serde2.avro.AvroSerDe.deserialize(AvroSerDe.java:195) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:321) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:320) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$6.apply(Aggregate.scala:128) at org.apache.spark.sql.execution.Aggregate$$anonfun$execute$1$$anonfun$6.apply(Aggregate.scala:124) at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634) at org.apache.spark.rdd.RDD$$anonfun$14.apply(RDD.scala:634) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277) at org.apache.spark.rdd.RDD.iterator(RDD.scala:244) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1203) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1192) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1191) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1191) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:693) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:693) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1393) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1354) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) On Thu, Aug 27, 2015 at 10:38 AM, java8964 java8...@hotmail.com wrote: You can run hive query in the spark-avro, but you cannot query the hive view in the spark-avro, as the view is stored in the Hive metadata. What do you mean the right version of spark, then can't determine table schema problem is fixed? I faced this problem before, and my guess is the Hive library mismatch causing it, but not sure. I never faced your 2nd problem, can you post the whole stack for that error? Most of our datasets are also in AVRO format. Yong -- Date: Thu, 27 Aug 2015 09:45:45 -0700 Subject: Re: query avro hive table in spark sql From: gpatc...@gmail.com To: java8...@hotmail.com CC: mich...@databricks.com; user@spark.apache.org can we run hive queries using spark-avro ? In our case its not just reading the avro file. we have view in hive which is based on multiple tables. On Thu, Aug 27, 2015 at 9:41 AM, Giri P gpatc...@gmail.com wrote: we are using hive1.1 . I was able to fix below error when I used right version spark 15/08/26 17:51:12 WARN avro.AvroSerdeUtils: Encountered AvroSerdeException determining schema. Returning signal schema to indicate problem org.apache.hadoop.hive.serde2.avro.AvroSerdeException: Neither avro.schema.literal nor avro.schema.url specified, can't determine table schema at org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils. determineSchemaOrThrowException(AvroSerdeUtils.java:68) at org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils. determineSchemaOrReturnErrorSchema(AvroSerdeUtils.java:93) at org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:60) at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer( MetaStoreUtils.java:375
RE: query avro hive table in spark sql
You can run hive query in the spark-avro, but you cannot query the hive view in the spark-avro, as the view is stored in the Hive metadata. What do you mean the right version of spark, then can't determine table schema problem is fixed? I faced this problem before, and my guess is the Hive library mismatch causing it, but not sure. I never faced your 2nd problem, can you post the whole stack for that error? Most of our datasets are also in AVRO format. Yong Date: Thu, 27 Aug 2015 09:45:45 -0700 Subject: Re: query avro hive table in spark sql From: gpatc...@gmail.com To: java8...@hotmail.com CC: mich...@databricks.com; user@spark.apache.org can we run hive queries using spark-avro ? In our case its not just reading the avro file. we have view in hive which is based on multiple tables. On Thu, Aug 27, 2015 at 9:41 AM, Giri P gpatc...@gmail.com wrote: we are using hive1.1 . I was able to fix below error when I used right version spark 15/08/26 17:51:12 WARN avro.AvroSerdeUtils: Encountered AvroSerdeExceptiondetermining schema. Returning signal schema to indicate problemorg.apache.hadoop.hive.serde2.avro.AvroSerdeException: Neitheravro.schema.literal nor avro.schema.url specified, can't determine tableschema atorg.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.determineSchemaOrThrowException(AvroSerdeUtils.java:68) atorg.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.determineSchemaOrReturnErrorSchema(AvroSerdeUtils.java:93) atorg.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:60) atorg.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:375) atorg.apache.hadoop.hive.ql.metadata.Partition.getDeserializer(Partition.java:249) But I still see this error when querying on some hive avro tables. 15/08/26 17:51:27 WARN scheduler.TaskSetManager: Lost task 30.0 in stage 0.0 (TID 14, dtord01hdw0227p.dc.dotomi.net): org.apache.hadoop.hive.serde2.avro.BadSchemaException at org.apache.hadoop.hive.serde2.avro.AvroSerDe.deserialize(AvroSerDe.java:91) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:321) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:320) I haven't tried spark-avro. We are using Sqlcontext to run queries in our application Any idea if this issue might be coz of querying across different schema version of data ? ThanksGiri On Thu, Aug 27, 2015 at 5:39 AM, java8964 java8...@hotmail.com wrote: What version of the Hive you are using? And do you compile to the right version of Hive when you compiled Spark? BTY, spark-avro works great for our experience, but still, some non-tech people just want to use as a SQL shell in spark, like HIVE-CLI. Yong From: mich...@databricks.com Date: Wed, 26 Aug 2015 17:48:44 -0700 Subject: Re: query avro hive table in spark sql To: gpatc...@gmail.com CC: user@spark.apache.org I'd suggest looking at http://spark-packages.org/package/databricks/spark-avro On Wed, Aug 26, 2015 at 11:32 AM, gpatcham gpatc...@gmail.com wrote: Hi, I'm trying to query hive table which is based on avro in spark SQL and seeing below errors. 15/08/26 17:51:12 WARN avro.AvroSerdeUtils: Encountered AvroSerdeException determining schema. Returning signal schema to indicate problem org.apache.hadoop.hive.serde2.avro.AvroSerdeException: Neither avro.schema.literal nor avro.schema.url specified, can't determine table schema at org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.determineSchemaOrThrowException(AvroSerdeUtils.java:68) at org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.determineSchemaOrReturnErrorSchema(AvroSerdeUtils.java:93) at org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:60) at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:375) at org.apache.hadoop.hive.ql.metadata.Partition.getDeserializer(Partition.java:249) Its not able to determine schema. Hive table is pointing to avro schema using url. I'm stuck and couldn't find more info on this. Any pointers ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/query-avro-hive-table-in-spark-sql-tp24462.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: query avro hive table in spark sql
can we run hive queries using spark-avro ? In our case its not just reading the avro file. we have view in hive which is based on multiple tables. On Thu, Aug 27, 2015 at 9:41 AM, Giri P gpatc...@gmail.com wrote: we are using hive1.1 . I was able to fix below error when I used right version spark 15/08/26 17:51:12 WARN avro.AvroSerdeUtils: Encountered AvroSerdeException determining schema. Returning signal schema to indicate problem org.apache.hadoop.hive.serde2.avro.AvroSerdeException: Neither avro.schema.literal nor avro.schema.url specified, can't determine table schema at org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils. determineSchemaOrThrowException(AvroSerdeUtils.java:68) at org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils. determineSchemaOrReturnErrorSchema(AvroSerdeUtils.java:93) at org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:60) at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer( MetaStoreUtils.java:375) at org.apache.hadoop.hive.ql.metadata.Partition.getDeserializer(Partition. java:249) But I still see this error when querying on some hive avro tables. 15/08/26 17:51:27 WARN scheduler.TaskSetManager: Lost task 30.0 in stage 0.0 (TID 14, dtord01hdw0227p.dc.dotomi.net): org.apache.hadoop.hive.serde2.avro.BadSchemaException at org.apache.hadoop.hive.serde2.avro.AvroSerDe.deserialize(AvroSerDe.java:91) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:321) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:320) I haven't tried spark-avro. We are using Sqlcontext to run queries in our application Any idea if this issue might be coz of querying across different schema version of data ? Thanks Giri On Thu, Aug 27, 2015 at 5:39 AM, java8964 java8...@hotmail.com wrote: What version of the Hive you are using? And do you compile to the right version of Hive when you compiled Spark? BTY, spark-avro works great for our experience, but still, some non-tech people just want to use as a SQL shell in spark, like HIVE-CLI. Yong -- From: mich...@databricks.com Date: Wed, 26 Aug 2015 17:48:44 -0700 Subject: Re: query avro hive table in spark sql To: gpatc...@gmail.com CC: user@spark.apache.org I'd suggest looking at http://spark-packages.org/package/databricks/spark-avro On Wed, Aug 26, 2015 at 11:32 AM, gpatcham gpatc...@gmail.com wrote: Hi, I'm trying to query hive table which is based on avro in spark SQL and seeing below errors. 15/08/26 17:51:12 WARN avro.AvroSerdeUtils: Encountered AvroSerdeException determining schema. Returning signal schema to indicate problem org.apache.hadoop.hive.serde2.avro.AvroSerdeException: Neither avro.schema.literal nor avro.schema.url specified, can't determine table schema at org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.determineSchemaOrThrowException(AvroSerdeUtils.java:68) at org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.determineSchemaOrReturnErrorSchema(AvroSerdeUtils.java:93) at org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:60) at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:375) at org.apache.hadoop.hive.ql.metadata.Partition.getDeserializer(Partition.java:249) Its not able to determine schema. Hive table is pointing to avro schema using url. I'm stuck and couldn't find more info on this. Any pointers ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/query-avro-hive-table-in-spark-sql-tp24462.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: query avro hive table in spark sql
we are using hive1.1 . I was able to fix below error when I used right version spark 15/08/26 17:51:12 WARN avro.AvroSerdeUtils: Encountered AvroSerdeException determining schema. Returning signal schema to indicate problem org.apache.hadoop.hive.serde2.avro.AvroSerdeException: Neither avro.schema.literal nor avro.schema.url specified, can't determine table schema at org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils. determineSchemaOrThrowException(AvroSerdeUtils.java:68) at org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils. determineSchemaOrReturnErrorSchema(AvroSerdeUtils.java:93) at org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:60) at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer( MetaStoreUtils.java:375) at org.apache.hadoop.hive.ql.metadata.Partition.getDeserializer(Partition. java:249) But I still see this error when querying on some hive avro tables. 15/08/26 17:51:27 WARN scheduler.TaskSetManager: Lost task 30.0 in stage 0.0 (TID 14, dtord01hdw0227p.dc.dotomi.net): org.apache.hadoop.hive.serde2.avro.BadSchemaException at org.apache.hadoop.hive.serde2.avro.AvroSerDe.deserialize(AvroSerDe.java:91) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:321) at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:320) I haven't tried spark-avro. We are using Sqlcontext to run queries in our application Any idea if this issue might be coz of querying across different schema version of data ? Thanks Giri On Thu, Aug 27, 2015 at 5:39 AM, java8964 java8...@hotmail.com wrote: What version of the Hive you are using? And do you compile to the right version of Hive when you compiled Spark? BTY, spark-avro works great for our experience, but still, some non-tech people just want to use as a SQL shell in spark, like HIVE-CLI. Yong -- From: mich...@databricks.com Date: Wed, 26 Aug 2015 17:48:44 -0700 Subject: Re: query avro hive table in spark sql To: gpatc...@gmail.com CC: user@spark.apache.org I'd suggest looking at http://spark-packages.org/package/databricks/spark-avro On Wed, Aug 26, 2015 at 11:32 AM, gpatcham gpatc...@gmail.com wrote: Hi, I'm trying to query hive table which is based on avro in spark SQL and seeing below errors. 15/08/26 17:51:12 WARN avro.AvroSerdeUtils: Encountered AvroSerdeException determining schema. Returning signal schema to indicate problem org.apache.hadoop.hive.serde2.avro.AvroSerdeException: Neither avro.schema.literal nor avro.schema.url specified, can't determine table schema at org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.determineSchemaOrThrowException(AvroSerdeUtils.java:68) at org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.determineSchemaOrReturnErrorSchema(AvroSerdeUtils.java:93) at org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:60) at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:375) at org.apache.hadoop.hive.ql.metadata.Partition.getDeserializer(Partition.java:249) Its not able to determine schema. Hive table is pointing to avro schema using url. I'm stuck and couldn't find more info on this. Any pointers ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/query-avro-hive-table-in-spark-sql-tp24462.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: query avro hive table in spark sql
BTY, spark-avro works great for our experience, but still, some non-tech people just want to use as a SQL shell in spark, like HIVE-CLI. To clarify: you can still use the spark-avro library with pure SQL. Just use the CREATE TABLE ... USING com.databricks.spark.avro OPTIONS (path '...') syntax.
Re: query avro hive table in spark sql
Can you select something from this table using Hive? And also could you post your spark code which leads to this exception. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/query-avro-hive-table-in-spark-sql-tp24462p24468.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
query avro hive table in spark sql
Hi, I'm trying to query hive table which is based on avro in spark SQL and seeing below errors. 15/08/26 17:51:12 WARN avro.AvroSerdeUtils: Encountered AvroSerdeException determining schema. Returning signal schema to indicate problem org.apache.hadoop.hive.serde2.avro.AvroSerdeException: Neither avro.schema.literal nor avro.schema.url specified, can't determine table schema at org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.determineSchemaOrThrowException(AvroSerdeUtils.java:68) at org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.determineSchemaOrReturnErrorSchema(AvroSerdeUtils.java:93) at org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:60) at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:375) at org.apache.hadoop.hive.ql.metadata.Partition.getDeserializer(Partition.java:249) Its not able to determine schema. Hive table is pointing to avro schema using url. I'm stuck and couldn't find more info on this. Any pointers ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/query-avro-hive-table-in-spark-sql-tp24462.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: query avro hive table in spark sql
I'd suggest looking at http://spark-packages.org/package/databricks/spark-avro On Wed, Aug 26, 2015 at 11:32 AM, gpatcham gpatc...@gmail.com wrote: Hi, I'm trying to query hive table which is based on avro in spark SQL and seeing below errors. 15/08/26 17:51:12 WARN avro.AvroSerdeUtils: Encountered AvroSerdeException determining schema. Returning signal schema to indicate problem org.apache.hadoop.hive.serde2.avro.AvroSerdeException: Neither avro.schema.literal nor avro.schema.url specified, can't determine table schema at org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.determineSchemaOrThrowException(AvroSerdeUtils.java:68) at org.apache.hadoop.hive.serde2.avro.AvroSerdeUtils.determineSchemaOrReturnErrorSchema(AvroSerdeUtils.java:93) at org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:60) at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:375) at org.apache.hadoop.hive.ql.metadata.Partition.getDeserializer(Partition.java:249) Its not able to determine schema. Hive table is pointing to avro schema using url. I'm stuck and couldn't find more info on this. Any pointers ? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/query-avro-hive-table-in-spark-sql-tp24462.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org