Re: How to insert data into 2000 partitions(directories) of ORC/parquet at a time using Spark SQL?

2016-06-15 Thread swetha kasireddy
Hi Mich, No I have not tried that. My requirement is to insert that from an hourly Spark Batch job. How is it different by trying to insert with Hive CLI or beeline? Thanks, Swetha On Tue, Jun 14, 2016 at 10:44 AM, Mich Talebzadeh wrote: > Hi Swetha, > > Have you

Re: How to insert data into 2000 partitions(directories) of ORC/parquet at a time using Spark SQL?

2016-06-14 Thread Mich Talebzadeh
Hi Swetha, Have you actually tried doing this in Hive using Hive CLI or beeline? Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw *

Re: How to insert data into 2000 partitions(directories) of ORC/parquet at a time using Spark SQL?

2016-06-14 Thread Mich Talebzadeh
In all probability there is no user database created in Hive Create a database yourself sql("create if not exists database test") It would be helpful if you grasp some concept of Hive databases etc? HTH Dr Mich Talebzadeh LinkedIn *

Re: How to insert data into 2000 partitions(directories) of ORC/parquet at a time using Spark SQL?

2016-06-14 Thread swetha kasireddy
Hi Bijay, This approach might not work for me as I have to do partial inserts/overwrites in a given table and data_frame.write.partitionBy will overwrite the entire table. Thanks, Swetha On Mon, Jun 13, 2016 at 9:25 PM, Bijay Pathak wrote: > Hi Swetha, > > One

Re: How to insert data into 2000 partitions(directories) of ORC/parquet at a time using Spark SQL?

2016-06-14 Thread Sree Eedupuganti
Hi Spark users, i am new to spark. I am trying to connect hive using SparkJavaContext. Unable to connect to the database. By executing the below code i can see only "default" database. Can anyone help me out. What i need is a sample program for Querying Hive results using SparkJavaContext. Need to

Re: How to insert data into 2000 partitions(directories) of ORC/parquet at a time using Spark SQL?

2016-06-13 Thread Bijay Pathak
Hi Swetha, One option is to use Hive with the above issues fixed which is Hive 2.0 or Cloudera CDH Hive 1.2 which has above issue resolved. One thing to remember is it's not the Hive you have installed but the Hive Spark is using which in Spark 1.6 is Hive version 1.2 as of now. The workaround I

Re: How to insert data into 2000 partitions(directories) of ORC/parquet at a time using Spark SQL?

2016-06-13 Thread swetha kasireddy
Hi Mich, Following is a sample code snippet: *val *userDF = userRecsDF.toDF("idPartitioner", "dtPartitioner", "userId", "userRecord").persist() System.*out*.println(" userRecsDF.partitions.size"+ userRecsDF.partitions.size) userDF.registerTempTable("userRecordsTemp") sqlContext.sql("SET

Re: How to insert data into 2000 partitions(directories) of ORC/parquet at a time using Spark SQL?

2016-06-13 Thread swetha kasireddy
Hi Bijay, If I am hitting this issue, https://issues.apache.org/jira/browse/HIVE-11940. What needs to be done? Incrementing to higher version of hive is the only solution? Thanks! On Mon, Jun 13, 2016 at 10:47 AM, swetha kasireddy < swethakasire...@gmail.com> wrote: > Hi, > > Following is a

Re: How to insert data into 2000 partitions(directories) of ORC/parquet at a time using Spark SQL?

2016-06-13 Thread swetha kasireddy
Hi, Following is a sample code snippet: *val *userDF = userRecsDF.toDF("idPartitioner", "dtPartitioner", "userId", "userRecord").persist() System.*out*.println(" userRecsDF.partitions.size"+ userRecsDF.partitions.size) userDF.registerTempTable("userRecordsTemp") sqlContext.sql("SET

Re: How to insert data into 2000 partitions(directories) of ORC/parquet at a time using Spark SQL?

2016-06-10 Thread Bijay Pathak
Hello, Looks like you are hitting this: https://issues.apache.org/jira/browse/HIVE-11940. Thanks, Bijay On Thu, Jun 9, 2016 at 9:25 PM, Mich Talebzadeh wrote: > cam you provide a code snippet of how you are populating the target table > from temp table. > > > HTH

Re: How to insert data into 2000 partitions(directories) of ORC/parquet at a time using Spark SQL?

2016-06-09 Thread Mich Talebzadeh
cam you provide a code snippet of how you are populating the target table from temp table. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw *

Re: How to insert data into 2000 partitions(directories) of ORC/parquet at a time using Spark SQL?

2016-06-09 Thread swetha kasireddy
No, I am reading the data from hdfs, transforming it , registering the data in a temp table using registerTempTable and then doing insert overwrite using Spark SQl' hiveContext. On Thu, Jun 9, 2016 at 3:40 PM, Mich Talebzadeh wrote: > how are you doing the insert?

Re: How to insert data into 2000 partitions(directories) of ORC/parquet at a time using Spark SQL?

2016-06-09 Thread Mich Talebzadeh
how are you doing the insert? from an existing table? Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw * http://talebzadehmich.wordpress.com On

Re: How to insert data into 2000 partitions(directories) of ORC/parquet at a time using Spark SQL?

2016-06-09 Thread swetha kasireddy
400 cores are assigned to this job. On Thu, Jun 9, 2016 at 1:16 PM, Stephen Boesch wrote: > How many workers (/cpu cores) are assigned to this job? > > 2016-06-09 13:01 GMT-07:00 SRK : > >> Hi, >> >> How to insert data into 2000

Re: How to insert data into 2000 partitions(directories) of ORC/parquet at a time using Spark SQL?

2016-06-09 Thread Stephen Boesch
How many workers (/cpu cores) are assigned to this job? 2016-06-09 13:01 GMT-07:00 SRK : > Hi, > > How to insert data into 2000 partitions(directories) of ORC/parquet at a > time using Spark SQL? It seems to be not performant when I try to insert > 2000 directories of

How to insert data into 2000 partitions(directories) of ORC/parquet at a time using Spark SQL?

2016-06-09 Thread SRK
Hi, How to insert data into 2000 partitions(directories) of ORC/parquet at a time using Spark SQL? It seems to be not performant when I try to insert 2000 directories of Parquet/ORC using Spark SQL. Did anyone face this issue? Thanks! -- View this message in context: