Re: Executor tab values in Spark Application UI

2017-02-18 Thread Jacek Laskowski
Hi, Yes, it's the "sum of values for all tasks" (it's based on TaskMetrics which are accumulators behind the scenes). Why "it appears that value isnt much of help while debugging?" ? Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0

Re: Serialization error - sql UDF related

2017-02-18 Thread Yong Zhang
You define "getNewColumnName" as method, which requires the class/object holding it has to be serializable. >From the stack trace, it looks like this method defined in >ProductDimensionSFFConverterRealApp, but it is not serializable. In fact, your method only uses String and Boolean, which

Re: Efficient Spark-Sql queries when only nth Column changes

2017-02-18 Thread Yong Zhang
If you only need the group by in the same hierarchy logic, then you can group by at the lowest level, and cache it, then use the cached DF to derive to the higher level, so Spark will only scan the originally table once, and reuse the cache in the following. val df_base =

Efficient Spark-Sql queries when only nth Column changes

2017-02-18 Thread Patrick
Hi, I have read 5 columns from parquet into data frame. My queries on the parquet table is of below type: val df1 = sqlContext.sql(select col1,col2,count(*) from table groupby col1,col2) val df2 = sqlContext.sql(select col1,col3,count(*) from table groupby col1,col3) val df3 =

Re: Avalance of warnings trying to read Spark 1.6.X Parquet into Spark 2.X

2017-02-18 Thread Stephen Boesch
For now I have added to the log4j.properties: log4j.logger.org.apache.parquet=ERROR 2017-02-18 11:50 GMT-08:00 Stephen Boesch : > The following JIRA mentions that a fix made to read parquet 1.6.2 into 2.X > STILL leaves an "avalanche" of warnings: > > >

Avalance of warnings trying to read Spark 1.6.X Parquet into Spark 2.X

2017-02-18 Thread Stephen Boesch
The following JIRA mentions that a fix made to read parquet 1.6.2 into 2.X STILL leaves an "avalanche" of warnings: https://issues.apache.org/jira/browse/SPARK-17993 Here is the text inside one of the last comments before it was merged: I have built the code from the PR and it indeed

Re: Query data in subdirectories in Hive Partitions using Spark SQL

2017-02-18 Thread Jon Gregg
Spark has partition discovery if your data is laid out in a parquet-friendly directory structure: http://spark.apache.org/docs/latest/sql-programming-guide.html#partition-discovery You can also use wildcards to get subdirectories (I'm using spark 1.6 here) >> data2 =

Re: question on SPARK_WORKER_CORES

2017-02-18 Thread Yan Facai
Hi, kodali. SPARK_WORKER_CORES is designed for cluster resource manager, see http://spark.apache.org/docs/latest/cluster-overview.html if interested. For standalone mode, you should use the following 3 arguments to allocate resource for normal spark tasks: - --executor-memory -

Class Cast Exception while read from GS and write to S3.I feel gettng while writeing to s3.

2017-02-18 Thread Manohar753
Hi All, able to run my simple spark job Read and write to S3 in local ,when i move to cluster gettng below cast exception.Spark Environment a using 2.0.1. please help out if any has faced this kind of issue already. 02/18 10:35:23 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0,