Re: Spark SQL: filter if column substring does not contain a string

2015-11-15 Thread Ted Yu
Please take a look at test_column_operators in python/pyspark/sql/tests.py FYI On Sat, Nov 14, 2015 at 11:49 PM, YaoPau wrote: > I'm using pyspark 1.3.0, and struggling with what should be simple. > Basically, I'd like to run this: > > site_logs.filter(lambda r: 'page_row'

Re: spark 1.4 GC issue

2015-11-15 Thread Ted Yu
Please take a look at http://www.infoq.com/articles/tuning-tips-G1-GC Cheers On Sat, Nov 14, 2015 at 10:03 PM, Renu Yadav wrote: > I have tried with G1 GC .Please if anyone can provide their setting for GC. > At code level I am : > 1.reading orc table usind dataframe > 2.map

Re: Kafka Offsets after application is restarted using Spark Streaming Checkpointing

2015-11-15 Thread Cody Koeninger
Not sure on that, maybe someone else can chime in On Sat, Nov 14, 2015 at 4:51 AM, kundan kumar wrote: > Hi Cody , > > Thanks for the clarification. I will try to come up with some workaround. > > I have an another doubt. When my job is restarted, and recovers from the >

Yarn Spark on EMR

2015-11-15 Thread SURAJ SHETH
Hi, Yarn UI on 18080 stops receiving updates Spark jobs/tasks immediately after it starts. We see only one task completed in the UI while the other hasn't got any resources while in reality, more than 5 tasks would have completed. Hadoop - Amazon 2.6 Spark - 1.5 Thanks and Regards, Suraj Sheth

Data Locality Issue

2015-11-15 Thread Renu Yadav
Hi, I am working on spark 1.4 and reading a orc table using dataframe and converting that DF to RDD I spark UI I observe that 50 % task are running on locality and ANY and very few on LOCAL. What would be the possible reason for this? Please help. I have even changed locality settings Thanks

Re: Very slow startup for jobs containing millions of tasks

2015-11-15 Thread Ted Yu
Kudo goes to Josh. Cheers > On Nov 14, 2015, at 10:04 PM, Jerry Lam wrote: > > Hi Ted, > > That looks exactly what happens. It has been 5 hrs now. The code was built > for 1.4. Thank you very much! > > Best Regards, > > Jerry > > Sent from my iPhone > >> On 14

Re: Spark and Spring Integrations

2015-11-15 Thread Jagat Singh
Not direct answer to your question. But It might be useful for you to check Spring XD Spark integration. https://github.com/spring-projects/spring-xd-samples/tree/master/spark-streaming-wordcount-java-processor On Mon, Nov 16, 2015 at 6:14 AM, Muthu Jayakumar wrote: > I

spark sql "create temporary function" scala functions

2015-11-15 Thread Deenar Toraskar
Hi I wanted to know how to go about registering scala functions as UDFs using spark sql create temporary function statement. Currently I do the following /* convert prices to holding period returns */ object VaR extends Serializable { def returns(prices :Seq[Double], horizon: Integer) :

Spark-shell connecting to Mesos stuck at sched.cpp

2015-11-15 Thread Jong Wook Kim
I'm having problem connecting my spark app to a Mesos cluster; any help on the below question would be appreciated. http://stackoverflow.com/questions/33727154/spark-shell-connecting-to-mesos-stuck-at-sched-cpp Thanks, Jong Wook

DynamoDB Connector?

2015-11-15 Thread Charles Cobb
Hi, What is the best practice for reading from DynamoDB from Spark? I know I can use the Java API, but this doesn't seem to take data locality into consideration at all. I was looking for something along the lines of the cassandra connector: https://github.com/datastax/spark-cassandra-connector

Re: Spak filestreaming issue

2015-11-15 Thread Deng Ching-Mallete
Hi, It could be that the timestamp of the file is old. Moving the file does not update the file's timestamp. After you have launched the job, either 'touch' the file if it's already in /opt/test/ to update the timestamp or 'cp' the file to a temporary directory then 'mv' it to /opt/test/. HTH,

how to get the tracking URL with ip address instead of hostname in yarn-cluster model

2015-11-15 Thread wangpan
Hi, everyone! I deployed spark in the yarn model cluster. I export the SPARK_MASTER_IP with an ip, and make sure that all the spark configuration files use ip value in SPARH_HOME/conf/*, and all the hadoop configuration files use ip value in HADOOP_HOME/etc/*。I can success to submit spark job by

Re: Data Locality Issue

2015-11-15 Thread Renu Yadav
what are the parameters on which locality depends On Sun, Nov 15, 2015 at 5:54 PM, Renu Yadav wrote: > Hi, > > I am working on spark 1.4 and reading a orc table using dataframe and > converting that DF to RDD > > I spark UI I observe that 50 % task are running on locality and

Re: How to passing parameters to another java class

2015-11-15 Thread Zhang, Jingyu
Thanks Fengdong, the startTime, and endTime are null in the method of call(Iterator lines). Java do not allow top-level class to be Static. >From Spark docs, I can broadcast them but I don't know how to receive them form another class. On 16 November 2015 at 16:16, Fengdong Yu

How to passing parameters to another java class

2015-11-15 Thread Zhang, Jingyu
I want to pass two parameters into new java class from rdd.mapPartitions(), the code like following. ---Source Code Main method: /*the parameters that I want to pass into the PixelGenerator.class for selecting any items between the startTime and the endTime. */ int startTime, endTime;

Re: How to passing parameters to another java class

2015-11-15 Thread Fengdong Yu
Can you try : new PixelGenerator(startTime, endTime) ? > On Nov 16, 2015, at 12:47 PM, Zhang, Jingyu wrote: > > I want to pass two parameters into new java class from rdd.mapPartitions(), > the code like following. > ---Source Code > > Main method: > > /*the

Re: How to passing parameters to another java class

2015-11-15 Thread Fengdong Yu
Just make PixelGenerator as a nested static class? > On Nov 16, 2015, at 1:22 PM, Zhang, Jingyu wrote: > > Fengdong

Re: How to passing parameters to another java class

2015-11-15 Thread Zhang, Jingyu
Thanks, that worked for local environment but not in the Spark Cluster. On 16 November 2015 at 16:05, Fengdong Yu wrote: > Can you try : new PixelGenerator(startTime, endTime) ? > > > > On Nov 16, 2015, at 12:47 PM, Zhang, Jingyu > wrote: >

Re: How to passing parameters to another java class

2015-11-15 Thread Fengdong Yu
If you got “cannot Serialized” Exception, then you need to PixelGenerator as a Static class. > On Nov 16, 2015, at 1:10 PM, Zhang, Jingyu wrote: > > Thanks, that worked for local environment but not in the Spark Cluster. > > > On 16 November 2015 at 16:05,

Hive on Spark Vs Spark SQL

2015-11-15 Thread kiran lonikar
I would like to know if Hive on Spark uses or shares the execution code with Spark SQL or DataFrames? More specifically, does Hive on Spark benefit from the changes made to Spark SQL, project Tungsten? Or is it completely different execution path where it creates its own plan and executes on RDD?

Re: Hive on Spark Vs Spark SQL

2015-11-15 Thread Reynold Xin
It's a completely different path. On Sun, Nov 15, 2015 at 10:37 PM, kiran lonikar wrote: > I would like to know if Hive on Spark uses or shares the execution code > with Spark SQL or DataFrames? > > More specifically, does Hive on Spark benefit from the changes made to >

NoSuchMethodError

2015-11-15 Thread Yogesh Vyas
Hi, While I am trying to read a json file using SQLContext, i get the following error: Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.sql.SQLContext.(Lorg/apache/spark/api/java/JavaSparkContext;)V at com.honeywell.test.testhive.HiveSpark.main(HiveSpark.java:15)

Re: NoSuchMethodError

2015-11-15 Thread Fengdong Yu
The code looks good. can you check your ‘import’ in your code? because it calls ‘honeywell.test’? > On Nov 16, 2015, at 3:02 PM, Yogesh Vyas wrote: > > Hi, > > While I am trying to read a json file using SQLContext, i get the > following error: > > Exception in

Re: NoSuchMethodError

2015-11-15 Thread Fengdong Yu
Ignore my inputs, I think HiveSpark.java is your main method located. can you paste the whole pom.xml and your code? > On Nov 16, 2015, at 3:39 PM, Fengdong Yu wrote: > > The code looks good. can you check your ‘import’ in your code? because it > calls

Re: NoSuchMethodError

2015-11-15 Thread Fengdong Yu
And, also make sure your scala version is 2.11 for your build. > On Nov 16, 2015, at 3:43 PM, Fengdong Yu wrote: > > Ignore my inputs, I think HiveSpark.java is your main method located. > > can you paste the whole pom.xml and your code? > > > > >> On Nov 16,

Re: Hive on Spark Vs Spark SQL

2015-11-15 Thread kiran lonikar
So does not benefit from Project Tungsten right? On Mon, Nov 16, 2015 at 12:07 PM, Reynold Xin wrote: > It's a completely different path. > > > On Sun, Nov 15, 2015 at 10:37 PM, kiran lonikar wrote: > >> I would like to know if Hive on Spark uses or

Re: Hive on Spark Vs Spark SQL

2015-11-15 Thread Reynold Xin
No it does not -- although it'd benefit from some of the work to make shuffle more robust. On Sun, Nov 15, 2015 at 10:45 PM, kiran lonikar wrote: > So does not benefit from Project Tungsten right? > > > On Mon, Nov 16, 2015 at 12:07 PM, Reynold Xin

Re: NoSuchMethodError

2015-11-15 Thread Fengdong Yu
what’s your SQL? > On Nov 16, 2015, at 3:02 PM, Yogesh Vyas wrote: > > Hi, > > While I am trying to read a json file using SQLContext, i get the > following error: > > Exception in thread "main" java.lang.NoSuchMethodError: >

Re: NoSuchMethodError

2015-11-15 Thread Yogesh Vyas
I am trying to just read a JSON file in SQLContext and print the dataframe as follows: SparkConf conf = new SparkConf().setMaster("local").setAppName("AppName"); JavaSparkContext sc = new JavaSparkContext(conf); SQLContext sqlContext = new SQLContext(sc); DataFrame df =

ReduceByKeyAndWindow does repartitioning twice on recovering from checkpoint

2015-11-15 Thread kundan kumar
Hi, I am using spark streaming check-pointing mechanism and reading the data from Kafka. The window duration for my application is 2 hrs with a sliding interval of 15 minutes. So, my batches run at following intervals... - 09:45 - 10:00 - 10:15 - 10:30 - and so on When my job is

Re: Kafka Offsets after application is restarted using Spark Streaming Checkpointing

2015-11-15 Thread kundan kumar
Sure Thanks !! On Sun, Nov 15, 2015 at 9:13 PM, Cody Koeninger wrote: > Not sure on that, maybe someone else can chime in > > On Sat, Nov 14, 2015 at 4:51 AM, kundan kumar > wrote: > >> Hi Cody , >> >> Thanks for the clarification. I will try to come

Re: Spark and Spring Integrations

2015-11-15 Thread Muthu Jayakumar
I have only written Akka code in Scala only. Here is the akka documentation that would help you to get started... http://doc.akka.io/docs/akka/2.4.0/intro/getting-started.html >JavaSparkContext(conf) The idea is to create a SparkContext and pass it as a props (constructor in java sense) to an