pyspark split pair rdd to multiple

2016-04-19 Thread pth001
Hi, How can I split pair rdd [K, V] to map [K, Array(V)] efficiently in Pyspark? Best, Patcharee - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

dataframe access hive complex type

2016-01-19 Thread pth001
Hi, How dataframe (What API) can access hive complex type (Struct, Array, Maps)? Thanks, Patcharee - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

OrcNewOutputFormat write partitioned orc file

2015-11-16 Thread pth001
Hi, How to write partitioned orc file using OrcNewOutputFormat in MapReduce? Thanks Patcharee

override log4j level

2015-11-16 Thread pth001
Hi, How can I override log4j level by using --hiveconf? I want to use ERROR level for some tasks. Thanks, Patcharee

Re: character '' not supported here

2015-07-18 Thread pth001
Hi, The query result 11236119012.64043-5.9708868.5592070.0 0.0 0.0-19.6869931308.804799848.00.006196644 0.00.0 301.274750.382470460.0NULL11 20081 11236122012.513598-6.36717137.3927946 0.0

alter table on multiple partitions

2015-06-30 Thread pth001
Hi, I have a table partitioned by a, b, c, d column. I want to alter concatenate this table. Is it possible to use wildcard in alter command to alter several partitions at a time? For ex. alter table TestHive partition (a=1, b=*, c=2, d=*) CONCATENATE; BR, Patcharee

How to use KryoSerializer : ClassNotFoundException

2015-06-24 Thread pth001
Hi, I am using spark 1.4. I wanted to serialize by KryoSerializer, but got ClassNotFoundException. The configuration and exception is below. When I submitted the job, I also provided --jars mylib.jar which contains WRFVariableZ. conf.set(spark.serializer,

memory needed for each executor

2015-06-21 Thread pth001
Hi, How can I know the size of memory needed for each executor (one core) to execute each job? If there are many cores per executors, will the memory be the multiplication (memory needed for each executor (one core) * no. of cores)? Any suggestions/guidelines? BR, Patcharee

Re: Dataframe Write : Tables created with SQLContext must be TEMPORARY. Use a HiveContext instead.

2015-06-13 Thread pth001
://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables Hope this helps, Will On June 13, 2015, at 3:36 PM, pth001 patcharee.thong...@uni.no wrote: Hi, I am using spark 0.14. I try to insert data into a hive table (in orc format) from DF. partitionedTestDF.write.format

Dataframe Write : Tables created with SQLContext must be TEMPORARY. Use a HiveContext instead.

2015-06-13 Thread pth001
Hi, I am using spark 0.14. I try to insert data into a hive table (in orc format) from DF. partitionedTestDF.write.format(org.apache.spark.sql.hive.orc.DefaultSource) .mode(org.apache.spark.sql.SaveMode.Append).partitionBy(zone,z,year,month).saveAsTable(testorc) When this job is submitted by

ERROR 2135: Received error from store function.Premature EOF: no length prefix available

2015-06-09 Thread pth001
Hi, My pig on Tez (to store dataset into a partitioned hive table) throws the following exception. What can be wrong? How can I fix it? 2015-06-09 10:59:57,268 ERROR [TezChild] runtime.PigProcessor: Encountered exception while processing: org.apache.pig.backend.executionengine.ExecException:

filter by query result

2015-05-27 Thread pth001
Hi, I am new to pig. First I queried a hive table (x = LOAD 'x' USING org.apache.hive.hcatalog.pig.HCatLoader();) and got a single record/value. How can I used this single value to filter in another query? I hope to get a better performance by filter as soon as possible. BR, Patcharee

create a pipeline

2015-04-15 Thread pth001
Hi, How can I create a pipeline (containing a sequence of pig scripts)? BR, Patcharee