Re: Loading already existing tables in spark shell
Hi Jeetendra, Please try the following in spark shell. it is like executing an sql command. sqlContext.sql(use database name) Regards, Ishwardeep From: Jeetendra Gangele gangele...@gmail.com Sent: Tuesday, August 25, 2015 12:57 PM To: Ishwardeep Singh Cc: user Subject: Re: Loading already existing tables in spark shell In spark shell use database not working saying use not found in the shell? did you ran this with scala shell ? On 24 August 2015 at 18:26, Ishwardeep Singh ishwardeep.si...@impetus.co.inmailto:ishwardeep.si...@impetus.co.in wrote: Hi Jeetendra, I faced this issue. I did not specify the database where this table exists. Please set the database by using use database command before executing the query. Regards, Ishwardeep From: Jeetendra Gangele gangele...@gmail.commailto:gangele...@gmail.com Sent: Monday, August 24, 2015 5:47 PM To: user Subject: Loading already existing tables in spark shell Hi All I have few tables in hive and I wanted to run query against them with spark as execution engine. Can I direct;y load these tables in spark shell and run query? I tried with 1.val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) 2.qlContext.sql(FROM event_impressions select count(*)) where event_impressions is the table name. It give me error saying org.apache.spark.sql.AnalysisException: no such table event_impressions; line 1 pos 5 Does anybody hit similar issues? regards jeetendra NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference. NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
Re: Loading already existing tables in spark shell
Hi Jeetendra, I faced this issue. I did not specify the database where this table exists. Please set the database by using use database command before executing the query. Regards, Ishwardeep From: Jeetendra Gangele gangele...@gmail.com Sent: Monday, August 24, 2015 5:47 PM To: user Subject: Loading already existing tables in spark shell Hi All I have few tables in hive and I wanted to run query against them with spark as execution engine. Can I direct;y load these tables in spark shell and run query? I tried with 1.val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) 2.qlContext.sql(FROM event_impressions select count(*)) where event_impressions is the table name. It give me error saying org.apache.spark.sql.AnalysisException: no such table event_impressions; line 1 pos 5 Does anybody hit similar issues? regards jeetendra NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
RE: Spark SQL support for Hive 0.14
Thanks Steve and Michael for your response. Is there a tentative release date for Spark 1.5? From: Michael Armbrust [mailto:mich...@databricks.com] Sent: Tuesday, August 4, 2015 11:53 PM To: Steve Loughran ste...@hortonworks.com Cc: Ishwardeep Singh ishwardeep.si...@impetus.co.in; user@spark.apache.org Subject: Re: Spark SQL support for Hive 0.14 I'll add that while Spark SQL 1.5 compiles against Hive 1.2.1, it has support for reading from metastores for Hive 0.12 - 1.2.1 On Tue, Aug 4, 2015 at 9:59 AM, Steve Loughran ste...@hortonworks.commailto:ste...@hortonworks.com wrote: Spark 1.3.1 1.4 only support Hive 0.13 Spark 1.5 is going to be released against Hive 1.2.1; it'll skip Hive .14 support entirely and go straight to the currently supported Hive release. See SPARK-8064 for the gory details On 3 Aug 2015, at 23:01, Ishwardeep Singh ishwardeep.si...@impetus.co.inmailto:ishwardeep.si...@impetus.co.in wrote: Hi, Does spark SQL support Hive 0.14? The documentation refers to Hive 0.13. Is there a way to compile spark with Hive 0.14? Currently we are using Spark 1.3.1. Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-support-for-Hive-0-14-tp24122.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.orgmailto:user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.orgmailto:user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.orgmailto:user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.orgmailto:user-h...@spark.apache.org NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference.
Spark SQL support for Hive 0.14
Hi, Does spark SQL support Hive 0.14? The documentation refers to Hive 0.13. Is there a way to compile spark with Hive 0.14? Currently we are using Spark 1.3.1. Thanks -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-support-for-Hive-0-14-tp24122.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Unable to query existing hive table from spark sql 1.3.0
Your table is in which database - default or result. By default spark will try to look for table in default database. If the table exists in the result database try to prefix the table name with database name like select * from result.salarytest or set the database by executing use database name -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-query-existing-hive-table-from-spark-sql-1-3-0-tp24108p24121.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
RE: [Spark SQL 1.3.1] data frame saveAsTable returns exception
Hi Michael Ayan, Thank you for your response to my problem. Michael do we have a tentative release date for Spark version 1.4? Regards, Ishwardeep From: Michael Armbrust [mailto:mich...@databricks.com] Sent: Wednesday, May 13, 2015 10:54 PM To: ayan guha Cc: Ishwardeep Singh; user Subject: Re: [Spark SQL 1.3.1] data frame saveAsTable returns exception I think this is a bug in our date handling that should be fixed in Spark 1.4. On Wed, May 13, 2015 at 8:23 AM, ayan guha guha.a...@gmail.commailto:guha.a...@gmail.com wrote: Your stack trace says it can't convert date to integer. You sure about column positions? On 13 May 2015 21:32, Ishwardeep Singh ishwardeep.si...@impetus.co.inmailto:ishwardeep.si...@impetus.co.in wrote: Hi , I am using Spark SQL 1.3.1. I have created a dataFrame using jdbc data source and am using saveAsTable() method but got the following 2 exceptions: java.lang.RuntimeException: Unsupported datatype DecimalType() at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$fromDataType$2.apply(ParquetTypes.scala:372) at org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$fromDataType$2.apply(ParquetTypes.scala:316) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.sql.parquet.ParquetTypesConverter$.fromDataType(ParquetTypes.scala:315) at org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$4.apply(ParquetTypes.scala:395) at org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$4.apply(ParquetTypes.scala:394) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.immutable.List.foreach(List.scala:318) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.sql.parquet.ParquetTypesConverter$.convertFromAttributes(ParquetTypes.scala:393) at org.apache.spark.sql.parquet.ParquetTypesConverter$.writeMetaData(ParquetTypes.scala:440) at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache.prepareMetadata(newParquet.scala:260) at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$6.apply(newParquet.scala:276) at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$6.apply(newParquet.scala:269) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.immutable.List.foreach(List.scala:318) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache.refresh(newParquet.scala:269) at org.apache.spark.sql.parquet.ParquetRelation2.init(newParquet.scala:391) at org.apache.spark.sql.parquet.DefaultSource.createRelation(newParquet.scala:98) at org.apache.spark.sql.parquet.DefaultSource.createRelation(newParquet.scala:128) at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:240) at org.apache.spark.sql.hive.execution.CreateMetastoreDataSourceAsSelect.run(commands.scala:218) at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:54) at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:54) at org.apache.spark.sql.execution.ExecutedCommand.execute(commands.scala:64) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:1099) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:1099) at org.apache.spark.sql.DataFrame.saveAsTable(DataFrame.scala:1121) at org.apache.spark.sql.DataFrame.saveAsTable(DataFrame.scala:1071) at org.apache.spark.sql.DataFrame.saveAsTable(DataFrame.scala:1037) at org.apache.spark.sql.DataFrame.saveAsTable(DataFrame.scala:1015) java.lang.ClassCastException: java.sql.Date cannot be cast to java.lang.Integer at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:106) at org.apache.spark.sql.parquet.RowWriteSupport.writePrimitive(ParquetTableSupport.scala:215) at org.apache.spark.sql.parquet.RowWriteSupport.writeValue(ParquetTableSupport.scala:192) at org.apache.spark.sql.parquet.RowWriteSupport.write(ParquetTableSupport.scala:171) at org.apache.spark.sql.parquet.RowWriteSupport.write(ParquetTableSupport.scala:134) at parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:120) at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java
[Spark SQL 1.3.1] data frame saveAsTable returns exception
Hi , I am using Spark SQL 1.3.1. I have created a dataFrame using jdbc data source and am using saveAsTable() method but got the following 2 exceptions: java.lang.RuntimeException: Unsupported datatype DecimalType() at scala.sys.package$.error(package.scala:27) at org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$fromDataType$2.apply(ParquetTypes.scala:372) at org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$fromDataType$2.apply(ParquetTypes.scala:316) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.sql.parquet.ParquetTypesConverter$.fromDataType(ParquetTypes.scala:315) at org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$4.apply(ParquetTypes.scala:395) at org.apache.spark.sql.parquet.ParquetTypesConverter$$anonfun$4.apply(ParquetTypes.scala:394) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.immutable.List.foreach(List.scala:318) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.sql.parquet.ParquetTypesConverter$.convertFromAttributes(ParquetTypes.scala:393) at org.apache.spark.sql.parquet.ParquetTypesConverter$.writeMetaData(ParquetTypes.scala:440) at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache.prepareMetadata(newParquet.scala:260) at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$6.apply(newParquet.scala:276) at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache$$anonfun$6.apply(newParquet.scala:269) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.immutable.List.foreach(List.scala:318) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.sql.parquet.ParquetRelation2$MetadataCache.refresh(newParquet.scala:269) at org.apache.spark.sql.parquet.ParquetRelation2.init(newParquet.scala:391) at org.apache.spark.sql.parquet.DefaultSource.createRelation(newParquet.scala:98) at org.apache.spark.sql.parquet.DefaultSource.createRelation(newParquet.scala:128) at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:240) at org.apache.spark.sql.hive.execution.CreateMetastoreDataSourceAsSelect.run(commands.scala:218) at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:54) at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:54) at org.apache.spark.sql.execution.ExecutedCommand.execute(commands.scala:64) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:1099) at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:1099) at org.apache.spark.sql.DataFrame.saveAsTable(DataFrame.scala:1121) at org.apache.spark.sql.DataFrame.saveAsTable(DataFrame.scala:1071) at org.apache.spark.sql.DataFrame.saveAsTable(DataFrame.scala:1037) at org.apache.spark.sql.DataFrame.saveAsTable(DataFrame.scala:1015) java.lang.ClassCastException: java.sql.Date cannot be cast to java.lang.Integer at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:106) at org.apache.spark.sql.parquet.RowWriteSupport.writePrimitive(ParquetTableSupport.scala:215) at org.apache.spark.sql.parquet.RowWriteSupport.writeValue(ParquetTableSupport.scala:192) at org.apache.spark.sql.parquet.RowWriteSupport.write(ParquetTableSupport.scala:171) at org.apache.spark.sql.parquet.RowWriteSupport.write(ParquetTableSupport.scala:134) at parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:120) at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:81) at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:37) at org.apache.spark.sql.parquet.ParquetRelation2.org$apache$spark$sql$parquet$ParquetRelation2$$writeShard$1(newParquet.scala:671) at org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$insert$2.apply(newParquet.scala:689) at org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$insert$2.apply(newParquet.scala:689) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
Re: [Spark SQL 1.3.1] data frame saveAsTable returns exception
Hi, I am using spark-shell and the steps using which I can reproduce the issue are as follows: scala val dateDimDF= sqlContext.load(jdbc,Map(url-jdbc:teradata://192.168.145.58/DBS_PORT=1025,DATABASE=BENCHQADS,LOB_SUPPORT=OFF,USER= BENCHQADS,PASSWORD=abc,dbtable - date_dim)) scala dateDimDF.printSchema() root |-- d_date_sk: integer (nullable = false) |-- d_date_id: string (nullable = false) |-- d_date: date (nullable = true) |-- d_month_seq: integer (nullable = true) |-- d_week_seq: integer (nullable = true) |-- d_quarter_seq: integer (nullable = true) |-- d_year: integer (nullable = true) |-- d_dow: integer (nullable = true) |-- d_moy: integer (nullable = true) |-- d_dom: integer (nullable = true) |-- d_qoy: integer (nullable = true) |-- d_fy_year: integer (nullable = true) |-- d_fy_quarter_seq: integer (nullable = true) |-- d_fy_week_seq: integer (nullable = true) |-- d_day_name: string (nullable = true) |-- d_quarter_name: string (nullable = true) |-- d_holiday: string (nullable = true) |-- d_weekend: string (nullable = true) |-- d_following_holiday: string (nullable = true) |-- d_first_dom: integer (nullable = true) |-- d_last_dom: integer (nullable = true) |-- d_same_day_ly: integer (nullable = true) |-- d_same_day_lq: integer (nullable = true) |-- d_current_day: string (nullable = true) |-- d_current_week: string (nullable = true) |-- d_current_month: string (nullable = true) |-- d_current_quarter: string (nullable = true) |-- d_current_year: string (nullable = true) scala dateDimDF.saveAsTable(date_dim_tera_save) 15/05/13 19:57:05 INFO JDBCRDD: closed connection 15/05/13 19:57:05 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 2) java.lang.ClassCastException: java.sql.Date cannot be cast to java.lang.Integer at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:106) at org.apache.spark.sql.parquet.RowWriteSupport.writePrimitive(ParquetTableSupport.scala:215) at org.apache.spark.sql.parquet.RowWriteSupport.writeValue(ParquetTableSupport.scala:192) at org.apache.spark.sql.parquet.RowWriteSupport.write(ParquetTableSupport.scala:171) at org.apache.spark.sql.parquet.RowWriteSupport.write(ParquetTableSupport.scala:134) at parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:120) at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:81) at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:37) at org.apache.spark.sql.parquet.ParquetRelation2.org$apache$spark$sql$parquet$ParquetRelation2$$writeShard$1(newParquet.scala:671) at org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$insert$2.apply(newParquet.scala:689) at org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$insert$2.apply(newParquet.scala:689) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) 15/05/13 19:57:05 WARN TaskSetManager: Lost task 0.0 in stage 2.0 (TID 2, localhost): java.lang.ClassCastException: java.sql.Date cannot be cast to java.lang.Integer at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:106) at org.apache.spark.sql.parquet.RowWriteSupport.writePrimitive(ParquetTableSupport.scala:215) at org.apache.spark.sql.parquet.RowWriteSupport.writeValue(ParquetTableSupport.scala:192) at org.apache.spark.sql.parquet.RowWriteSupport.write(ParquetTableSupport.scala:171) at org.apache.spark.sql.parquet.RowWriteSupport.write(ParquetTableSupport.scala:134) at parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:120) at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:81) at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:37) at org.apache.spark.sql.parquet.ParquetRelation2.org$apache$spark$sql$parquet$ParquetRelation2$$writeShard$1(newParquet.scala:671) at org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$insert$2.apply(newParquet.scala:689) at org.apache.spark.sql.parquet.ParquetRelation2$$anonfun$insert$2.apply(newParquet.scala:689) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:64) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) scala val
Re: Unable to join table across data sources using sparkSQL
Finally got it working. I was trying to access hive using the jdbc driver like I was trying to access the terradata. It took me some time to figure out that default sqlContext created by Spark supported hive and it uses the hive-site.xml in spark conf folder to access hive. I had to use my database in hive. spark-shell sqlContext.sql(use terradata_live) Then I registered by terradata database tables as temporary tables. spark-shell val itemDF= hc.load(jdbc,Map(url-jdbc:teradata://192.168.145.58/DBS_PORT=1025,DATABASE=BENCHQADS,LOB_SUPPORT=OFF,USER= BENCHQADS,PASSWORD=,dbtable - item)) spark-shell itemDF.registerTempTable(itemterra) spark-shell sqlContext.sql(select store_sales.* from store_sales join itemterra on (store_sales.id = itemterra.sales_id) But these seems to be some issue when I try to do the same using hive jdbc driver. Another difference that I found was in printSchema() output. printSchema() output for data frame created using hive driver prefixes the column names with table name but the same does not happen for terradata tables. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-join-table-across-data-sources-using-sparkSQL-tp22761p22816.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
RE: Unable to join table across data sources using sparkSQL
Hi Ankit, printSchema() works fine for all the tables. hiveStoreSalesDF.printSchema() root |-- store_sales.ss_sold_date_sk: integer (nullable = true) |-- store_sales.ss_sold_time_sk: integer (nullable = true) |-- store_sales.ss_item_sk: integer (nullable = true) |-- store_sales.ss_customer_sk: integer (nullable = true) |-- store_sales.ss_cdemo_sk: integer (nullable = true) |-- store_sales.ss_hdemo_sk: integer (nullable = true) |-- store_sales.ss_addr_sk: integer (nullable = true) |-- store_sales.ss_store_sk: integer (nullable = true) |-- store_sales.ss_promo_sk: integer (nullable = true) |-- store_sales.ss_ticket_number: integer (nullable = true) |-- store_sales.ss_quantity: integer (nullable = true) |-- store_sales.ss_wholesale_cost: double (nullable = true) |-- store_sales.ss_list_price: double (nullable = true) |-- store_sales.ss_sales_price: double (nullable = true) |-- store_sales.ss_ext_discount_amt: double (nullable = true) |-- store_sales.ss_ext_sales_price: double (nullable = true) |-- store_sales.ss_ext_wholesale_cost: double (nullable = true) |-- store_sales.ss_ext_list_price: double (nullable = true) |-- store_sales.ss_ext_tax: double (nullable = true) |-- store_sales.ss_coupon_amt: double (nullable = true) |-- store_sales.ss_net_paid: double (nullable = true) |-- store_sales.ss_net_paid_inc_tax: double (nullable = true) |-- store_sales.ss_net_profit: double (nullable = true) dateDimDF.printSchema() root |-- d_date_sk: integer (nullable = false) |-- d_date_id: string (nullable = false) |-- d_date: date (nullable = true) |-- d_month_seq: integer (nullable = true) |-- d_week_seq: integer (nullable = true) |-- d_quarter_seq: integer (nullable = true) |-- d_year: integer (nullable = true) |-- d_dow: integer (nullable = true) |-- d_moy: integer (nullable = true) |-- d_dom: integer (nullable = true) |-- d_qoy: integer (nullable = true) |-- d_fy_year: integer (nullable = true) |-- d_fy_quarter_seq: integer (nullable = true) |-- d_fy_week_seq: integer (nullable = true) |-- d_day_name: string (nullable = true) |-- d_quarter_name: string (nullable = true) |-- d_holiday: string (nullable = true) |-- d_weekend: string (nullable = true) |-- d_following_holiday: string (nullable = true) |-- d_first_dom: integer (nullable = true) |-- d_last_dom: integer (nullable = true) |-- d_same_day_ly: integer (nullable = true) |-- d_same_day_lq: integer (nullable = true) |-- d_current_day: string (nullable = true) |-- d_current_week: string (nullable = true) |-- d_current_month: string (nullable = true) |-- d_current_quarter: string (nullable = true) |-- d_current_year: string (nullable = true) itemDF.printSchema() root |-- i_item_sk: integer (nullable = false) |-- i_item_id: string (nullable = false) |-- i_rec_start_date: date (nullable = true) |-- i_rec_end_date: date (nullable = true) |-- i_item_desc: string (nullable = true) |-- i_current_price: decimal (nullable = true) |-- i_wholesale_cost: decimal (nullable = true) |-- i_brand_id: integer (nullable = true) |-- i_brand: string (nullable = true) |-- i_class_id: integer (nullable = true) |-- i_class: string (nullable = true) |-- i_category_id: integer (nullable = true) |-- i_category: string (nullable = true) |-- i_manufact_id: integer (nullable = true) |-- i_manufact: string (nullable = true) |-- i_size: string (nullable = true) |-- i_formulation: string (nullable = true) |-- i_color: string (nullable = true) |-- i_units: string (nullable = true) |-- i_container: string (nullable = true) |-- i_manager_id: integer (nullable = true) |-- i_product_name: string (nullable = true) Regards, Ishwardeep From: ankitjindal [via Apache Spark User List] [mailto:ml-node+s1001560n22766...@n3.nabble.com] Sent: Tuesday, May 5, 2015 5:00 PM To: Ishwardeep Singh Subject: RE: Unable to join table across data sources using sparkSQL Just check the Schema of both the tables using frame.printSchema(); If you reply to this email, your message will be added to the discussion below: http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-join-table-across-data-sources-using-sparkSQL-tp22761p22766.html To unsubscribe from Unable to join table across data sources using sparkSQL, click herehttp://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=22761code=aXNod2FyZGVlcC5zaW5naEBpbXBldHVzLmNvLmlufDIyNzYxfDgzMDExNzI4OQ==. NAMLhttp://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely
RE: Unable to join table across data sources using sparkSQL
Hi , I am using Spark 1.3.0. I was able to join a JSON file on HDFS registered as a TempTable with a table in MySQL. On the same lines I tried to join a table in Hive with another table in Teradata but I get a query parse exception. Regards, Ishwardeep From: ankitjindal [via Apache Spark User List] [mailto:ml-node+s1001560n22762...@n3.nabble.com] Sent: Tuesday, May 5, 2015 1:26 PM To: Ishwardeep Singh Subject: Re: Unable to join table across data sources using sparkSQL Hi I was doing the same but with a file in hadoop as a temp table and one table in sql server but i succeeded in it. Which spark version are you using currently? Thanks Ankit If you reply to this email, your message will be added to the discussion below: http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-join-table-across-data-sources-using-sparkSQL-tp22761p22762.html To unsubscribe from Unable to join table across data sources using sparkSQL, click herehttp://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=22761code=aXNod2FyZGVlcC5zaW5naEBpbXBldHVzLmNvLmlufDIyNzYxfDgzMDExNzI4OQ==. NAMLhttp://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml NOTE: This message may contain information that is confidential, proprietary, privileged or otherwise protected by law. The message is intended solely for the named addressee. If received in error, please destroy and notify the sender. Any use of this email is prohibited when received in error. Impetus does not represent, warrant and/or guarantee, that the integrity of this communication has been maintained nor that the communication is free of errors, virus, interception or interference. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-join-table-across-data-sources-using-sparkSQL-tp22761p22763.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
RE: Unable to compile spark 1.1.0 on windows 8.1
Hi Judy, Thank you for your response. When I try to compile using maven mvn -Dhadoop.version=1.2.1 -DskipTests clean package I get an error Error: Could not find or load main class . I have maven 3.0.4. And when I run command sbt package I get the same exception as earlier. I have done the following steps: 1. Download spark-1.1.0.tgz from the spark site and unzip the compressed zip to a folder d:\myworkplace\software\spark-1.1.0 2. Then I downloaded sbt-0.13.7.zip and extract it to folder d:\myworkplace\software\sbt 3. Update the PATH environment variable to include d:\myworkplace\software\sbt\bin in the PATH. 4. Navigate to spark folder d:\myworkplace\software\spark-1.1.0 5. Run the command sbt assembly 6. As a side effect of this command a number of libraries are downloaded and I get an initial error that path C:\Users\ishwardeep.singh\.sbt\0.13\staging\ec3aa8f39111944cc5f2\sbt-pom-reader does not exist. 7. I manually create this subfolder ec3aa8f39111944cc5f2\sbt-pom-reader and retry to get the next error as described in my initial error. Is this the correct procedure to compile spark 1.1.0? Please let me know. Hoping to hear from you soon. Regards, ishwardeep -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-compile-spark-1-1-0-on-windows-8-1-tp19996p20075.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Unable to compile spark 1.1.0 on windows 8.1
Hi, I am trying to compile spark 1.1.0 on windows 8.1 but I get the following exception. [info] Compiling 3 Scala sources to D:\myworkplace\software\spark-1.1.0\project\target\scala-2.10\sbt0.13\classes... [error] D:\myworkplace\software\spark-1.1.0\project\SparkBuild.scala:26: object sbt is not a member of package com.typesafe [error] import com.typesafe.sbt.pom.{PomBuild, SbtPomKeys} [error] ^ [error] D:\myworkplace\software\spark-1.1.0\project\SparkBuild.scala:53: not found: type PomBuild [error] object SparkBuild extends PomBuild { [error] ^ [error] D:\myworkplace\software\spark-1.1.0\project\SparkBuild.scala:121: not found: value SbtPomKeys [error] otherResolvers = SbtPomKeys.mvnLocalRepository(dotM2 = Seq(Resolver.file(dotM2, dotM2))), [error]^ [error] D:\myworkplace\software\spark-1.1.0\project\SparkBuild.scala:165: value projectDefinitions is not a member of AnyRef [error] super.projectDefinitions(baseDirectory).map { x = [error] ^ [error] four errors found [error] (plugins/compile:compile) Compilation failed I have also setup scala 2.10. Need help to resolve this issue. Regards, Ishwardeep -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-compile-spark-1-1-0-on-windows-8-1-tp19996.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org