Re: hadoop2.6.0 + spark1.4.1 + python2.7.10

2015-09-06 Thread Ashish Dutt
Hi Aleksandar, Quite some time ago, I faced the same problem and I found a solution which I have posted here on my blog . See if that can help you and if it does not then you can check out these questions & solution on stackoverflow

Re: [streaming] Using org.apache.spark.Logging will silently break task execution

2015-09-06 Thread Gerard Maas
You need to take into consideration 'where' things are executing. The closure of the 'forEachRDD' executes in the driver. Therefore, the log statements printed during the execution of that part will be found in the driver logs. In contrast, the foreachPartition closure executes on the worker

Re: [streaming] Using org.apache.spark.Logging will silently break task execution

2015-09-06 Thread Понькин Алексей
OK, I got it. When I use 'yarn logs -applicationId ' command everything appears in right place. Thank you! -- Яндекс.Почта — надёжная почта http://mail.yandex.ru/neo2/collect/?exp=1=1 07.09.2015, 01:44, "Gerard Maas" : > You need to take into consideration 'where'

[streaming] Using org.apache.spark.Logging will silently break task execution

2015-09-06 Thread Alexey Ponkin
Hi, I have the following code object MyJob extends org.apache.spark.Logging{ ... val source: DStream[SomeType] ... source.foreachRDD { rdd => logInfo(s"""+++ForEachRDD+++""") rdd.foreachPartition { partitionOfRecords => logInfo(s"""+++ForEachPartition+++""") } } I

hadoop2.6.0 + spark1.4.1 + python2.7.10

2015-09-06 Thread Sasha Kacanski
Hi, I am successfully running python app via pyCharm in local mode setMaster("local[*]") When I turn on SparkConf().setMaster("yarn-client") and run via park-submit PysparkPandas.py I run into issue: Error from python worker: /cube/PY/Python27/bin/python: No module named pyspark PYTHONPATH

Re: Problem to persist Hibernate entity from Spark job

2015-09-06 Thread Zoran Jeremic
I have GenericDAO class which is initialized for each partition. This class uses SessionFactory.openSession() to open a new session in it's constructor. As per my understanding, this means that each partition have different session, but they are using the same SessionFactory to open it. why not

Re: SparkContext initialization error- java.io.IOException: No space left on device

2015-09-06 Thread shenyan zhen
Thank you both - yup: the /tmp disk space was filled up:) On Sun, Sep 6, 2015 at 11:51 AM, Ted Yu wrote: > Use the following command if needed: > df -i /tmp > > See >

Spark - launchng job for each action

2015-09-06 Thread Priya Ch
Hi All, In Spark, each action results in launching a job. Lets say my spark app looks as- val baseRDD =sc.parallelize(Array(1,2,3,4,5),2) val rdd1 = baseRdd.map(x => x+2) val rdd2 = rdd1.filter(x => x%2 ==0) val count = rdd2.count val firstElement = rdd2.first println("Count is"+count)

Re: Spark - launchng job for each action

2015-09-06 Thread ayan guha
Hi "... Here in job2, when calculating rdd.first..." If you mean if rdd2.first, then it uses rdd2 already computed by rdd2.count, because it is already available. If some partitions are not available due to GC, then only those partitions are recomputed. On Sun, Sep 6, 2015 at 5:11 PM, Jeff

Re: Meets "java.lang.IllegalArgumentException" when test spark ml pipe with DecisionTreeClassifier

2015-09-06 Thread Terry Hole
Hi, Owen, The dataframe "training" is from a RDD of case class: RDD[LabeledDocument], while the case class is defined as this: case class LabeledDocument(id: Long, text: String, *label: Double*) So there is already has the default "label" column with "double" type. I already tried to set the

Re: Problem to persist Hibernate entity from Spark job

2015-09-06 Thread Matthew Johnson
I agree with Igor - I would either make sure session is ThreadLocal or, more simply, why not create the session at the start of the saveInBatch method and close it at the end? Creating a SessionFactory is an expensive operation but creating a Session is a relatively cheap one. On 6 Sep 2015 07:27,

Re: Meets "java.lang.IllegalArgumentException" when test spark ml pipe with DecisionTreeClassifier

2015-09-06 Thread Sean Owen
I think somewhere alone the line you've not specified your label column -- it's defaulting to "label" and it does not recognize it, or at least not as a binary or nominal attribute. On Sun, Sep 6, 2015 at 5:47 AM, Terry Hole wrote: > Hi, Experts, > > I followed the guide

Re: Problem to persist Hibernate entity from Spark job

2015-09-06 Thread Igor Berman
how do you create your session? do you reuse it across threads? how do you create/close session manager? look for the problem in session creation, probably something deadlocked, as far as I remember hib.session should be created per thread On 6 September 2015 at 07:11, Zoran Jeremic

Re: SparkContext initialization error- java.io.IOException: No space left on device

2015-09-06 Thread Shixiong Zhu
The folder is in "/tmp" by default. Could you use "df -h" to check the free space of /tmp? Best Regards, Shixiong Zhu 2015-09-05 9:50 GMT+08:00 shenyan zhen : > Has anyone seen this error? Not sure which dir the program was trying to > write to. > > I am running Spark

Re: Spark - launchng job for each action

2015-09-06 Thread Priya Ch
Hi All, Thanks for the info. I have one more doubt - When writing a streaming application, I specify batch-interval. Lets say if the interval is 1sec, for every 1sec batch, rdd is formed and launches a job. If there are >1 action specified on an rddhow many jobs would it launch??? I mean

Re: Meets "java.lang.IllegalArgumentException" when test spark ml pipe with DecisionTreeClassifier

2015-09-06 Thread Sean Owen
(Sean) The error suggests that the type is not a binary or nominal attribute though. I think that's the missing step. A double-valued column need not be one of these attribute types. On Sun, Sep 6, 2015 at 10:14 AM, Terry Hole wrote: > Hi, Owen, > > The dataframe

Re: Meets "java.lang.IllegalArgumentException" when test spark ml pipe with DecisionTreeClassifier

2015-09-06 Thread Terry Hole
Sean Do you know how to tell decision tree that the "label" is a binary or set some attributes to dataframe to carry number of classes? Thanks! - Terry On Sun, Sep 6, 2015 at 5:23 PM, Sean Owen wrote: > (Sean) > The error suggests that the type is not a binary or nominal

udaf with multiple return values in spark 1.5.0

2015-09-06 Thread Simon Hafner
Hi everyone is it possible to return multiple values with an udaf defined in spark 1.5.0? The documentation [1] mentions abstract def dataType: DataType The DataType of the returned value of this UserDefinedAggregateFunction. so it's only possible to return a single value. Should I use

Re: SparkContext initialization error- java.io.IOException: No space left on device

2015-09-06 Thread Ted Yu
Use the following command if needed: df -i /tmp See https://wiki.gentoo.org/wiki/Knowledge_Base:No_space_left_on_device_while_there_is_plenty_of_space_available On Sun, Sep 6, 2015 at 6:15 AM, Shixiong Zhu wrote: > The folder is in "/tmp" by default. Could you use "df -h" to

Re: ClassCastException in driver program

2015-09-06 Thread Shixiong Zhu
Looks there are some circular references in SQL making the immutable List serialization fail in 2.11. In 2.11, Scala immutable List uses writeReplace()/readResolve() which don't play nicely with circular references. Here is an example to reproduce this issue in 2.11.6: class Foo extends

Re: Problem with repartition/OOM

2015-09-06 Thread Yana Kadiyska
Thanks Yanbo, I was running with 1G per executor; my file is 7.5 G, running with the standard block size of 128M, resulting in 7500/128M= 59 partitions naturally. My boxes have 8CPUs, so I figured they could be processing 8 tasks/partitions at a time, needing 8*(partition_size) memory per