pyspark split pair rdd to multiple

2016-04-19 Thread pth001

Hi,

How can I split pair rdd [K, V] to map [K, Array(V)] efficiently in Pyspark?

Best,
Patcharee

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



dataframe access hive complex type

2016-01-19 Thread pth001

Hi,

How dataframe (What API) can access hive complex type (Struct, Array, Maps)?

Thanks,
Patcharee

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



OrcNewOutputFormat write partitioned orc file

2015-11-16 Thread pth001

Hi,

How to write partitioned orc file using OrcNewOutputFormat in MapReduce?

Thanks
Patcharee


override log4j level

2015-11-16 Thread pth001

Hi,

How can I override log4j level by using --hiveconf? I want to use ERROR 
level for some tasks.


Thanks,
Patcharee


Re: character '' not supported here

2015-07-18 Thread pth001

Hi,

The query result

11236119012.64043-5.9708868.5592070.0 0.0
0.0-19.6869931308.804799848.00.006196644 0.00.0
301.274750.382470460.0NULL11 20081
11236122012.513598-6.36717137.3927946 0.0
0.00.0-22.3003921441.054799848.0 0.00508465060.0
0.0112.207870.304595230.0 NULL1120081
5122503682415.1955.1722354.9027147 
-0.0244086120.023590.553-38.96928-1130.0469 74660.54
2.5969802E-49.706164E-1123054.2680.0 0.241967370.0
NULL1120081
9121449412.25196412.081688-9.594620.0 0.0
0.0-25.93576258.6562599848.00.0021708217 0.00.0
1.29632131.15602660.0NULL11 20081
9121458412.3020987.752461-12.183463 0.0
0.00.0-24.983763351.195399848.0 0.00237235990.0
0.01.41373750.992398860.0 NULL1120081


I stored table in orc format, partitioned and compressed by ZLIB. The 
problem happened just after I concatenate table.


BR,
Patcharee

On 18/07/15 12:46, Nitin Pawar wrote:
select * without where will work because it does not involve file 
processing
I suspect the problem is with field delimiter so i asked for records 
so that we can see whats the data in each column


are you using csv file with columns delimited by some char and it has 
numeric data in quotes ?


On Sat, Jul 18, 2015 at 3:58 PM, patcharee patcharee.thong...@uni.no 
mailto:patcharee.thong...@uni.no wrote:


This select * from table limit 5; works, but not others. So?

Patcharee


On 18. juli 2015 12:08, Nitin Pawar wrote:

can you do select * from table limit 5;

On Sat, Jul 18, 2015 at 3:35 PM, patcharee
patcharee.thong...@uni.no mailto:patcharee.thong...@uni.no wrote:

Hi,

I am using hive 0.14 with Tez engine. Found a weird problem.
Any suggestions?

hive select count(*) from 4D;
line 1:1 character '' not supported here
line 1:2 character '' not supported here
line 1:3 character '' not supported here
line 1:4 character '' not supported here
line 1:5 character '' not supported here
line 1:6 character '' not supported here
line 1:7 character '' not supported here
line 1:8 character '' not supported here
line 1:9 character '' not supported here
...
...
line 1:131 character '' not supported here
line 1:132 character '' not supported here
line 1:133 character '' not supported here
line 1:134 character '' not supported here
line 1:135 character '' not supported here
line 1:136 character '' not supported here
line 1:137 character '' not supported here
line 1:138 character '' not supported here
line 1:139 character '' not supported here
line 1:140 character '' not supported here
line 1:141 character '' not supported here
line 1:142 character '' not supported here
line 1:143 character '' not supported here
line 1:144 character '' not supported here
line 1:145 character '' not supported here
line 1:146 character '' not supported here

BR,
Patcharee





-- 
Nitin Pawar





--
Nitin Pawar




alter table on multiple partitions

2015-06-30 Thread pth001

Hi,

I have a table partitioned by a, b, c, d column. I want to alter 
concatenate this table. Is it possible to use wildcard in alter command 
to alter several partitions at a time? For ex.


alter table TestHive partition (a=1, b=*, c=2, d=*) CONCATENATE;

BR,
Patcharee


How to use KryoSerializer : ClassNotFoundException

2015-06-24 Thread pth001

Hi,

I am using spark 1.4. I wanted to serialize by KryoSerializer, but got 
ClassNotFoundException. The configuration and exception is below. When I 
submitted the job, I also provided --jars mylib.jar which contains 
WRFVariableZ.


conf.set(spark.serializer, org.apache.spark.serializer.KryoSerializer)
conf.registerKryoClasses(Array(classOf[WRFVariableZ]))

Exception in thread main org.apache.spark.SparkException: Failed to 
register classes with Kryo
at 
org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:114)
Caused by: java.lang.ClassNotFoundException: 
no.uni.computing.io.WRFVariableZ


How can I configure it?

BR,
Patcharee

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



memory needed for each executor

2015-06-21 Thread pth001

Hi,

How can I know the size of memory needed for each executor (one core) to 
execute each job? If there are many cores per executors, will the memory 
be the multiplication (memory needed for each executor (one core) * no. 
of cores)?


Any suggestions/guidelines?

BR,
Patcharee

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Dataframe Write : Tables created with SQLContext must be TEMPORARY. Use a HiveContext instead.

2015-06-13 Thread pth001

I got it. Thanks!
Patcharee

On 13/06/15 23:00, Will Briggs wrote:

The context that is created by spark-shell is actually an instance of 
HiveContext. If you want to use it programmatically in your driver, you need to 
make sure that your context is a HiveContext, and not a SQLContext.

https://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables

Hope this helps,
Will

On June 13, 2015, at 3:36 PM, pth001 patcharee.thong...@uni.no wrote:

Hi,

I am using spark 0.14. I try to insert data into a hive table (in orc
format) from DF.

partitionedTestDF.write.format(org.apache.spark.sql.hive.orc.DefaultSource)
.mode(org.apache.spark.sql.SaveMode.Append).partitionBy(zone,z,year,month).saveAsTable(testorc)

When this job is submitted by spark-submit I get 
Exception in thread main java.lang.RuntimeException: Tables created
with SQLContext must be TEMPORARY. Use a HiveContext instead

But the job works fine on spark-shell. What can be wrong?

BR,
Patcharee

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org




-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Dataframe Write : Tables created with SQLContext must be TEMPORARY. Use a HiveContext instead.

2015-06-13 Thread pth001

Hi,

I am using spark 0.14. I try to insert data into a hive table (in orc 
format) from DF.


partitionedTestDF.write.format(org.apache.spark.sql.hive.orc.DefaultSource)
.mode(org.apache.spark.sql.SaveMode.Append).partitionBy(zone,z,year,month).saveAsTable(testorc)

When this job is submitted by spark-submit I get 
Exception in thread main java.lang.RuntimeException: Tables created 
with SQLContext must be TEMPORARY. Use a HiveContext instead


But the job works fine on spark-shell. What can be wrong?

BR,
Patcharee

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



ERROR 2135: Received error from store function.Premature EOF: no length prefix available

2015-06-09 Thread pth001

Hi,

My pig on Tez (to store dataset into a partitioned hive table) throws 
the following exception. What can be wrong? How can I fix it?


2015-06-09 10:59:57,268 ERROR [TezChild] runtime.PigProcessor: 
Encountered exception while processing:
org.apache.pig.backend.executionengine.ExecException: ERROR 2135: 
Received error from store function.Premature EOF: no length prefix available
at 
org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POStoreTez.getNextTuple(POStoreTez.java:141)
at 
org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.runPipeline(PigProcessor.java:316)
at 
org.apache.pig.backend.hadoop.executionengine.tez.runtime.PigProcessor.run(PigProcessor.java:195)
at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:324)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:176)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:168)

at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:168)
at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.call(TezTaskRunner.java:163)

at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.EOFException: Premature EOF: no length prefix available
at 
org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2208)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1440)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1362)
at 
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:589)


BR,
Patcharee


filter by query result

2015-05-27 Thread pth001

Hi,

I am new to pig. First I queried a hive table (x = LOAD 'x' USING 
org.apache.hive.hcatalog.pig.HCatLoader();) and got a single 
record/value. How can I used this single value to filter in another 
query? I hope to get a better performance by filter as soon as possible.


BR,
Patcharee


create a pipeline

2015-04-15 Thread pth001

Hi,

How can I create a pipeline (containing a sequence of pig scripts)?

BR,
Patcharee