sparkR 3rd library

2017-09-04 Thread patcharee
"rbga" at org.apache.spark.api.r.RRunner.compute(RRunner.scala:108) at org.apache.spark.api.r.BaseRRDD.compute(RRDD.scala:51) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala Any ideas/suggestions? BR,

what contribute to Task Deserialization Time

2016-07-21 Thread patcharee
! Patcharee - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: pyspark split pair rdd to multiple

2016-04-20 Thread patcharee
I can also use dataframe. Any suggestions? Best, Patcharee On 20. april 2016 10:43, Gourav Sengupta wrote: Is there any reason why you are not using data frames? Regards, Gourav On Tue, Apr 19, 2016 at 8:51 PM, pth001 <patcharee.thong...@uni.no <mailto:patcharee.thong...@uni.no&g

executor running time vs getting result from jupyter notebook

2016-04-14 Thread Patcharee Thongtra
be the factor of time spending on these steps? BR, Patcharee - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

kafka streaming topic partitions vs executors

2016-02-26 Thread patcharee
as the topic's partitions). However some executors are given more than 1 tasks and work on these tasks sequentially. Why Spark does not distribute these 10 tasks to 10 executors? How to do that? Thanks, Patcharee

Re: streaming textFileStream problem - got only ONE line

2016-01-29 Thread patcharee
I moved them every interval to the monitored directory. Patcharee On 25. jan. 2016 22:30, Shixiong(Ryan) Zhu wrote: Did you move the file into "hdfs://helmhdfs/user/patcharee/cerdata/", or write into it directly? `textFileStream` requires that files must be written to the monitored

Pyspark filter not empty

2016-01-29 Thread patcharee
Hi, In pyspark how to filter if a column of dataframe is not empty? I tried: dfNotEmpty = df.filter(df['msg']!='') It did not work. Thanks, Patcharee - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org

spark streaming input rate strange

2016-01-22 Thread patcharee
raises up to 10,000, stays at 10,000 a while and drops to about 7000-8000. - When clients = 20,000 the event rate raises up to 20,000, stays at 20,000 a while and drops to about 15000-17000. The same pattern Processing time is just about 400 ms. Any ideas/suggestions? Thanks, Patcharee

visualize data from spark streaming

2016-01-20 Thread patcharee
Hi, How to visualize realtime data (in graph/chart) from spark streaming? Any tools? Best, Patcharee - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

bad performance on PySpark - big text file

2015-12-08 Thread patcharee
the log of these two input splits (check python.PythonRunner: Times: total ... ) 15/12/08 07:37:15 INFO rdd.NewHadoopRDD: Input split: hdfs://helmhdfs/user/patcharee/ntap-raw-20151015-20151126/html2/budisansblog.blogspot.com.html:39728447488+134217728 15/12/08 08:49:30 INFO python.PythonRunner

Spark UI - Streaming Tab

2015-12-04 Thread patcharee
need to configure the history UI somehow to get such interface? Thanks, Patcharee - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

Spark applications metrics

2015-12-04 Thread patcharee
Hi How can I see the summary of data read / write, shuffle read / write, etc of an Application, not per stage? Thanks, Patcharee - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail

Re: Spark UI - Streaming Tab

2015-12-04 Thread patcharee
I ran streaming jobs, but no streaming tab appeared for those jobs. Patcharee On 04. des. 2015 18:12, PhuDuc Nguyen wrote: I believe the "Streaming" tab is dynamic - it appears once you have a streaming job running, not when the cluster is simply up. It does not depend on 1.6 an

Re: Spark Streaming - History UI

2015-12-02 Thread patcharee
I meant there is no streaming tab at all. It looks like I need version 1.6 Patcharee On 02. des. 2015 11:34, Steve Loughran wrote: The history UI doesn't update itself for live apps (SPARK-7889) -though I'm working on it Are you trying to view a running streaming job? On 2 Dec 2015, at 05

Spark Streaming - History UI

2015-12-01 Thread patcharee
Hi, On my history server UI, I cannot see "streaming" tab for any streaming jobs? I am using version 1.5.1. Any ideas? Thanks, Patcharee - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional c

custom inputformat recordreader

2015-11-26 Thread Patcharee Thongtra
Hi, In python how to use inputformat/custom recordreader? Thanks, Patcharee - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

data local read counter

2015-11-25 Thread Patcharee Thongtra
Hi, Is there a counter for data local read? I understood that it is locality level counter, but it seems not. Thanks, Patcharee - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail

How to run parallel on each DataFrame group

2015-11-05 Thread patcharee
roblem is each group after filtered is handled by an executor one by one. How to change the code to allow each group run in parallel? I looked at groupBy, but seem only for aggregation. Thanks, Patcharee

execute native system commands in Spark

2015-11-02 Thread patcharee
Hi, Is it possible to execute native system commands (in parallel) Spark, like scala.sys.process ? Best, Patcharee - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h

Re: sql query orc slow

2015-10-13 Thread Patcharee Thongtra
is not sorted / indexed - the split strategy hive.exec.orc.split.strategy BR, Patcharee On 10/09/2015 08:01 PM, Zhan Zhang wrote: That is weird. Unfortunately, there is no debug info available on this part. Can you please open a JIRA to add some debug information on the driver side? Thanks. Zhan

Re: sql query orc slow

2015-10-13 Thread Patcharee Thongtra
Hi Zhan Zhang, Here is the issue https://issues.apache.org/jira/browse/SPARK-11087 BR, Patcharee On 10/13/2015 06:47 PM, Zhan Zhang wrote: Hi Patcharee, I am not sure which side is wrong, driver or executor. If it is executor side, the reason you mentioned may be possible

Re: sql query orc slow

2015-10-09 Thread patcharee
Yes, the predicate pushdown is enabled, but still take longer time than the first method BR, Patcharee On 08. okt. 2015 18:43, Zhan Zhang wrote: Hi Patcharee, Did you enable the predicate pushdown in the second method? Thanks. Zhan Zhang On Oct 8, 2015, at 1:43 AM, patcharee

Re: sql query orc slow

2015-10-09 Thread patcharee
I set hiveContext.setConf("spark.sql.orc.filterPushdown", "true"). But from the log No ORC pushdown predicate for my query with WHERE clause. 15/10/09 19:16:01 DEBUG OrcInputFormat: No ORC pushdown predicate I did not understand what wrong with this. BR, Patcharee On

Re: sql query orc slow

2015-10-09 Thread patcharee
this time in the log pushdown predicate was generated but results was wrong (no results at all) 15/10/09 18:36:06 INFO OrcInputFormat: ORC pushdown predicate: leaf-0 = (EQUALS x 320) expr = leaf-0 Any ideas What wrong with this? Why the ORC pushdown predicate is not applied by the system? BR

hiveContext sql number of tasks

2015-10-07 Thread patcharee
to force the spark sql to use less tasks? BR, Patcharee - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

Idle time between jobs

2015-09-16 Thread patcharee
.scala:143 15/09/16 11:21:08 INFO DAGScheduler: Got job 2 (saveAsTextFile at GenerateHistogram.scala:143) with 1 output partitions 15/09/16 11:21:08 INFO DAGScheduler: Final stage: ResultStage 2(saveAsTextFile at GenerateHistogram.scala:143) BR,

spark performance - executor computing time

2015-09-15 Thread patcharee
ize and low gc time as others. What can impact the executor computing time? Any suggestions what parameters I should monitor/configure? BR, Patcharee - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additio

spark 1.5 sort slow

2015-09-01 Thread patcharee
y configuration explicitly? Any suggestions? BR, Patcharee - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

Re: Kryo serialization of classes in additional jars

2015-06-26 Thread patcharee
Hi, I am having this problem on spark 1.4. Do you have any ideas how to solve it? I tried to use spark.executor.extraClassPath, but it did not help BR, Patcharee On 04. mai 2015 23:47, Imran Rashid wrote: Oh, this seems like a real pain. You should file a jira, I didn't see an open issue

Re: HiveContext saveAsTable create wrong partition

2015-06-16 Thread patcharee
I found if I move the partitioned columns in schemaString and in Row to the end of the sequence, then it works correctly... On 16. juni 2015 11:14, patcharee wrote: Hi, I am using spark 1.4 and HiveContext to append data into a partitioned hive table. I found that the data insert

HiveContext saveAsTable create wrong partition

2015-06-16 Thread patcharee
23 columns (longer than Tuple maximum length), so I use Row Object to store raw data, not Tuple. Here is some message from spark when it saved data 15/06/16 10:39:22 INFO metadata.Hive: Renaming src:hdfs://service-10-0.local:8020/tmp/hive-patcharee/hive_2015-06-16_10-39-21_205_8768669104487548472

sql.catalyst.ScalaReflection scala.reflect.internal.MissingRequirementError

2015-06-15 Thread patcharee
) at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:28) at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:410) at org.apache.spark.sql.SQLContext$implicits$.rddToDataFrameHolder(SQLContext.scala:335) BR, Patcharee

Re: hiveContext.sql NullPointerException

2015-06-11 Thread patcharee
from hive I got nothing. How can I fix this? Any suggestions please BR, Patcharee On 07. juni 2015 16:40, Cheng Lian wrote: Spark SQL supports Hive dynamic partitioning, so one possible workaround is to create a Hive table partitioned by zone, z, year, and month dynamically, and then insert

Re: hiveContext.sql NullPointerException

2015-06-08 Thread patcharee
Hi, Thanks for your guidelines. I will try it out. Btw how do you know HiveContext.sql (and also DataFrame.registerTempTable) is only expected to be invoked on driver side? Where can I find document? BR, Patcharee On 07. juni 2015 16:40, Cheng Lian wrote: Spark SQL supports Hive dynamic

Re: hiveContext.sql NullPointerException

2015-06-07 Thread patcharee
Hi, How can I expect to work on HiveContext on the executor? If only the driver can see HiveContext, does it mean I have to collect all datasets (very large) to the driver and use HiveContext there? It will be memory overload on the driver and fail. BR, Patcharee On 07. juni 2015 11:51

write multiple outputs by key

2015-06-06 Thread patcharee
combination) gets datasets. How can I fix this problem? Any suggestions are appreciated. BR, Patcharee - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

hiveContext.sql NullPointerException

2015-06-06 Thread patcharee
Hi, I try to insert data into a partitioned hive table. The groupByKey is to combine dataset into a partition of the hive table. After the groupByKey, I converted the iterable[X] to DB by X.toList.toDF(). But the hiveContext.sql throws NullPointerException, see below. Any suggestions? What

Re: FetchFailed Exception

2015-06-05 Thread patcharee
Hi, I has this problem before, and in my case it is because the executor/container was killed by yarn when it used more memory than allocated. You can check if your case is the same by checking yarn node manager log. Best, Patcharee On 05. juni 2015 07:25, ÐΞ€ρ@Ҝ (๏̯͡๏) wrote: I see

NullPointerException SQLConf.setConf

2015-06-04 Thread patcharee
) at java.lang.Thread.run(Thread.java:744) Best, Patcharee - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

MetaException(message:java.security.AccessControlException: Permission denied

2015-06-03 Thread patcharee
(Unknown Source) at org.apache.hadoop.hive.ql.metadata.Hive.alterPartition(Hive.java:469) ... 26 more BR, Patcharee - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h

Re: ERROR cluster.YarnScheduler: Lost executor

2015-06-03 Thread patcharee
1.3.1, is the problem from the https://issues.apache.org/jira/browse/SPARK-4516? Best, Patcharee On 03. juni 2015 10:11, Akhil Das wrote: Which version of spark? Looks like you are hitting this one https://issues.apache.org/jira/browse/SPARK-4516 Thanks Best Regards On Wed, Jun 3, 2015 at 1

Re: ERROR cluster.YarnScheduler: Lost executor

2015-06-03 Thread patcharee
, chunkIndex=1}, buffer=FileSegmentManagedBuffer{file=/hdisk3/hadoop/yarn/local/usercache/patcharee/appcache/application_1432633634512_0213/blockmgr-12d59e6b-0895-4a0e-9d06-152d2f7ee855/09/shuffle_0_56_0.data, offset=896, length=1132499356}} to /10.10.255.238:35430; closing connection

ERROR cluster.YarnScheduler: Lost executor

2015-06-03 Thread patcharee
Hi, What can be the cause of this ERROR cluster.YarnScheduler: Lost executor? How can I fix it? Best, Patcharee - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h

Insert overwrite to hive - ArrayIndexOutOfBoundsException

2015-06-02 Thread patcharee
) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Best, Patcharee - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h

saveasorcfile on partitioned orc

2015-05-20 Thread patcharee
Hi, I followed the information on https://www.mail-archive.com/reviews@spark.apache.org/msg141113.html to save orc file with spark 1.2.1. I can save data to a new orc file. I wonder how to save data to an existing and partitioned orc file? Any suggestions? BR, Patcharee

override log4j.properties

2015-04-09 Thread patcharee
Hello, How to override log4j.properties for a specific spark job? BR, Patcharee - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

Re: Spark Job History Server

2015-03-18 Thread patcharee
) at java.lang.Class.forName(Class.java:191) at org.apache.spark.deploy.history.HistoryServer$.main(HistoryServer.scala:183) at org.apache.spark.deploy.history.HistoryServer.main(HistoryServer.scala) Patcharee On 18. mars 2015 11:35, Akhil Das wrote: You can simply turn

Spark Job History Server

2015-03-18 Thread patcharee
spark.yarn.historyServer.address sandbox.hortonworks.com:19888 But got Exception in thread main java.lang.ClassNotFoundException: org.apache.spark.deploy.yarn.history.YarnHistoryProvider What class is really needed? How to fix it? Br, Patcharee

Re: Spark Job History Server

2015-03-18 Thread patcharee
Hi, My spark was compiled with yarn profile, I can run spark on yarn without problem. For the spark job history server problem, I checked spark-assembly-1.3.0-hadoop2.4.0.jar and found that the package org.apache.spark.deploy.yarn.history is missing. I don't know why BR, Patcharee

insert hive partitioned table

2015-03-16 Thread patcharee
of the partitioned column from the temporary table, how can I do that? BR, Patcharee

Re: insert hive partitioned table

2015-03-16 Thread patcharee
I would like to insert the table, and the value of the partition column to be inserted must be from temporary registered table/dataframe. Patcharee On 16. mars 2015 15:26, Cheng Lian wrote: Not quite sure whether I understand your question properly. But if you just want to read

No assemblies found in assembly/target/scala-2.10

2015-03-13 Thread Patcharee Thongtra
) at org.apache.spark.launcher.SparkSubmitCommandBuilder.buildCommand(SparkSubmitCommandBuilder.java:102) at org.apache.spark.launcher.Main.main(Main.java:74) Any ideas? Patcharee

bad symbolic reference. A signature in SparkContext.class refers to term conf in value org.apache.hadoop which is not available

2015-03-11 Thread Patcharee Thongtra
+= org.apache.spark %% spark-core % 1.3.0 libraryDependencies += org.apache.spark %% spark-streaming % 1.3.0 libraryDependencies += org.apache.spark %% spark-sql % 1.3.0 libraryDependencies += org.apache.spark % spark-hive_2.10 % 1.3.0 What should I do to fix it? BR, Patcharee

java.lang.RuntimeException: Couldn't find function Some

2015-03-09 Thread Patcharee Thongtra
tested the same code on spark shell, it worked. Best, Patcharee

Re: insert Hive table with RDD

2015-03-04 Thread patcharee
Hi, I guess that toDF() api in spark 1.3 which is required build from source code? Patcharee On 03. mars 2015 13:42, Cheng, Hao wrote: Using the SchemaRDD / DataFrame API via HiveContext Assume you're using the latest code, something probably like: val hc = new HiveContext(sc) import

insert Hive table with RDD

2015-03-03 Thread patcharee
Hi, How can I insert an existing hive table with an RDD containing my data? Any examples? Best, Patcharee - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h

NoSuchElementException: None.get

2015-02-27 Thread patcharee
belongs to a method of a case class, it should be executed sequentially? Any ideas? Best, Patcharee --- java.util.NoSuchElementException: None.get at scala.None$.get(Option.scala:313) at scala.None

custom inputformat serializable problem

2015-02-26 Thread patcharee
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0.0 in stage 0.0 (TID 0) had a not serializable result: no.uni.computing.io.WRFVariableText Any ideas? Best, Patcharee - To unsubscribe, e-mail: user-unsubscr

Re: method newAPIHadoopFile

2015-02-25 Thread patcharee
not complain. Please let me know if this solution is not good enough. Patcharee On 25. feb. 2015 10:57, Sean Owen wrote: OK, from the declaration you sent me separately: public class NetCDFFileInputFormat extends ArrayBasedFileInputFormat public abstract class ArrayBasedFileInputFormat extends

Re: method newAPIHadoopFile

2015-02-25 Thread patcharee
This is the declaration of my custom inputformat public class NetCDFFileInputFormat extends ArrayBasedFileInputFormat public abstract class ArrayBasedFileInputFormat extends org.apache.hadoop.mapreduce.lib.input.FileInputFormat Best, Patcharee On 25. feb. 2015 10:15, patcharee wrote: Hi

method newAPIHadoopFile

2015-02-25 Thread patcharee
, Patcharee - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

RDD String foreach println

2015-02-24 Thread patcharee
differently on job submit and shell? Best, Patcharee - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org