How many CPU cores are on that machine? Read http://qr.ae/8Uv3Xq
You can also confirm the above by running the pmap utility on your process
and most of the virtual memory would be under 'anon'.
On Fri, 13 May 2016 09:11 jone, wrote:
> The virtual memory is 9G When i run
can you please do the following:
jps|grep SparkSubmit|
and send the output of
ps aux|grep pid
top -p PID
and the output of
free
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
This is not the expected behavior, can you maybe post the code where you
are running into this?
On Thursday, May 12, 2016, Dood@ODDO wrote:
> Hello all,
>
> I have been programming for years but this has me baffled.
>
> I have an RDD[(String,Int)] that I return from a
Ping!!
Has anybody tested graceful shutdown of a spark streaming in yarn-cluster
mode?It looks like a defect to me.
On Thu, May 12, 2016 at 12:53 PM Rakesh H (Marketing Platform-BLR) <
rakes...@flipkart.com> wrote:
> We are on spark 1.5.1
> Above change was to add a shutdown hook.
> I am not
Spark has moved to build using Scala 2.11 by default in master/trunk.
As for the 2.0.0-SNAPSHOT, it is actually the version of master/trunk and
you might be missing some modules/profiles for your build. What command did
you use to build ?
On Thu, May 12, 2016 at 9:01 PM, Raghava Mutharaju <
Hello All,
I built Spark from the source code available at
https://github.com/apache/spark/. Although I haven't specified the
"-Dscala-2.11" option (to build with Scala 2.11), from the build messages I
see that it ended up using Scala 2.11. Now, for my application sbt, what
should be the spark
The virtual memory is 9G When i run org.apache.spark.examples.SparkPi under yarn-cluster model,which using default configurations.
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
Hi,
When we use spark 1.6.1 to word count a file of 25M bytes , with 2 nodes in
cluster mode, it cost 10 seconds to finish the task.
Why so slow ? Could you tell why?
Hello all,
I have been programming for years but this has me baffled.
I have an RDD[(String,Int)] that I return from a function after
extensive manipulation of an initial RDD of a different type. When I
return this RDD and initiate the .collectAsMap() on it from the caller,
I get an empty
Nobody has the answer ?
Another thing I've seen is that if I have no documents at all :
scala> df.select(explode(df("addresses.id")).as("aid")).collect
res27: Array[org.apache.spark.sql.Row] = Array()
Then
scala> df.select(explode(df("addresses.id")).as("aid"), df("id"))
Sorry, the bug link in previous mail was is wrong.
Here is the real link:
http://apache-spark-developers-list.1001551.n3.nabble.com/Re-SQL-Memory-leak-with-spark-streaming-and-spark-sql-in-spark-1-5-1-td14603.html
At 2016-05-13 09:49:05, "李明伟" wrote:
It seems
The link below doesn't refer to specific bug.
Can you send the correct link ?
Thanks
> On May 12, 2016, at 6:50 PM, "kramer2...@126.com" wrote:
>
> It seems we hit the same issue.
>
> There was a bug on 1.5.1 about memory leak. But I am using 1.6.1
>
> Here is the link
It seems we hit the same issue.
There was a bug on 1.5.1 about memory leak. But I am using 1.6.1
Here is the link about the bug in 1.5.1
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
At 2016-05-12 23:10:43, "Simon Schiff [via Apache Spark User List]"
Thank you Takeshi.
After executing df3.explain(true) I realised that the Optimiser batches are
being performed and also the predicate push down.
I think that only the analiser batches are executed when creating the data
frame by the context.sql(query). It seems that the optimiser batches are
Hi Arun,
Could you try using Stax or JaxB.
Thanks,
Pradeep
> On May 12, 2016, at 8:35 PM, Hyukjin Kwon wrote:
>
> Hi Arunkumar,
>
>
> I guess your records are self-closing ones.
>
> There is an issue open here, https://github.com/databricks/spark-xml/issues/92
>
>
Hi Arunkumar,
I guess your records are self-closing ones.
There is an issue open here,
https://github.com/databricks/spark-xml/issues/92
This is about XmlInputFormat.scala and it seems a bit tricky to handle the
case so I left open until now.
Thanks!
2016-05-13 5:03 GMT+09:00 Arunkumar
Not sure what do you mean? You want to have one exactly query running fine in
both sqlContext and HiveContext? The query parser are different, why do you
want to have this feature? Do I understand your question correctly?
Yong
Date: Thu, 12 May 2016 13:09:34 +0200
Subject: SQLContext and
Hi,
Which version of Spark you use?
The recent one cannot handle this kind of spilling, see:
http://spark.apache.org/docs/latest/tuning.html#memory-management-overview.
// maropu
On Fri, May 13, 2016 at 8:07 AM, Ashok Kumar
wrote:
> Hi,
>
> How one can avoid
Hi,
How one can avoid having Spark spill over after filling the node's memory.
Thanks
I agree that it can help build a community and be a place for real-time
conversations.
Xinh
On Thu, May 12, 2016 at 12:28 AM, Paweł Szulc wrote:
> Hi,
>
> well I guess the advantage of gitter over maling list is the same as with
> IRC. It's not actually a replacer because
ubuntu 14.04
On Thu, May 12, 2016 at 2:40 PM, Ted Yu wrote:
> Which OS are you using ?
>
> See http://en.linuxreviews.org/HOWTO_enable_core-dumps
>
> On Thu, May 12, 2016 at 2:23 PM, prateek arora > wrote:
>
>> Hi
>>
>> I am running my spark
Which OS are you using ?
See http://en.linuxreviews.org/HOWTO_enable_core-dumps
On Thu, May 12, 2016 at 2:23 PM, prateek arora
wrote:
> Hi
>
> I am running my spark application with some third party native libraries .
> but it crashes some time and show error "
Hi
I am running my spark application with some third party native libraries .
but it crashes some time and show error " Failed to write core dump. Core
dumps have been disabled. To enable core dumping, try "ulimit -c unlimited"
before starting Java again " .
Below are the log :
A fatal error
yep the same error I got
root
|-- a: array (nullable = true)
||-- element: integer (containsNull = false)
|-- b: integer (nullable = false)
NoViableAltException(35@[])
at
org.apache.hadoop.hive.ql.parse.HiveParser.primitiveType(HiveParser.java:38886)
at
Hello,
Greetings.
I'm trying to process a xml file exported from Health Kit application using
Spark SQL for learning purpose. The sample record data is like the below:
.
I want to have the column name of my table as the field value like type,
sourceName, sourceVersion and the row entries
This should be related:
https://github.com/JodaOrg/joda-time/issues/307
Do you have more of the stack trace ?
Cheers
On Thu, May 12, 2016 at 12:39 PM, Younes Naguib <
younes.nag...@tritondigital.com> wrote:
> Thanks,
>
> I used that.
>
> Now I seem to have the following problem:
>
>
Thanks,
I used that.
Now I seem to have the following problem:
java.lang.NullPointerException
at
org.joda.time.tz.CachedDateTimeZone.getInfo(CachedDateTimeZone.java:143)
at
org.joda.time.tz.CachedDateTimeZone.getOffset(CachedDateTimeZone.java:103)
at
I'm using the spark 1.6.1 (hadoop-2.6) and I'm trying to load a file that's
in s3. I've done this previously with spark 1.5 with no issue. Attempting
to load and count a single file as follows:
dataFrame = sqlContext.read.text('s3a://bucket/path-to-file.csv')
dataFrame.count()
But when it
Hello,
I have a random forest that works fine with 20 trees on 5e6 LabeledPoints
for training and 300 features... but when I try to scale it up just a bit
to 60 or 100 trees and 10e6 training points, it consistently gets
ExecutorLostFailure's due to "no recent heartbeats" with timeout of 120s.
Any inputs on this issue ?
Regards,
Sourav
On Tue, May 10, 2016 at 6:17 PM, Sourav Mazumder <
sourav.mazumde...@gmail.com> wrote:
> Hi,
>
> Need to get bit more understanding of reliability aspects of the Custom
> Receivers in the context of the code in spark-streaming-jms
>
I'm using Spark 1.6.1 along with scala 2.11.7 on my Ubuntu 14.04 with
following memory settings for my project: JAVA_OPTS="-Xmx8G -Xms2G" . My
data is organized in 20 json-like files, every file is about 8-15 Mb,
containing categorical and numerical values. I parse this data, passing by
DataFrame
Hi,
What's the result of `df3.explain(true)`?
// maropu
On Thu, May 12, 2016 at 10:04 AM, Telmo Rodrigues <
telmo.galante.rodrig...@gmail.com> wrote:
> I'm building spark from branch-1.6 source with mvn -DskipTests package and
> I'm running the following code with spark shell.
>
> *val*
I would like to also Mich, please send it through, thanks!
On Thu, 12 May 2016 at 15:14 Alonso Isidoro wrote:
> Me too, send me the guide.
>
> Enviado desde mi iPhone
>
> El 12 may 2016, a las 12:11, Ashok Kumar >
Me too, send me the guide.
Enviado desde mi iPhone
> El 12 may 2016, a las 12:11, Ashok Kumar
> escribió:
>
> Hi Dr Mich,
>
> I will be very keen to have a look at it and review if possible.
>
> Please forward me a copy
>
> Thanking you warmly
>
>
> On
Alternatively, you may try the built-in function:
regexp_extract
> On May 12, 2016, at 20:27, Ewan Leith wrote:
>
> You could use a UDF pretty easily, something like this should work, the
> lastElement function could be changed to do pretty much any string
>
Hi Xinh
sorry for my late reply
it`s slow because of two reasons (at least to my knowledge)
1. lots of IOs - writing as json, then reading and writing again as parquet
2. because of nested rdd I can`t run the cycle and filter by event_type
in parallel - this applies to your solution (3rd
Make a function (or lambda) that reads the text file. Make a RDD with a
list of X/Y, then map that RDD throught the file reading function. Same
with you X/Y/Z directory. You then have RDDs with the content of each file
as a record. Work with those as needed.
On Wed, May 11, 2016 at 2:36 PM
Hi there,
I'd like to write some iterative computation, i.e., computation that can be
done via a for loop. I understand that in Spark foreach is a better choice.
However, foreach and foreachPartition seem to be for self-contained
computation that only involves the corresponding Row or Partition,
You could use a UDF pretty easily, something like this should work, the
lastElement function could be changed to do pretty much any string manipulation
you want.
import org.apache.spark.sql.functions.udf
def lastElement(input: String) = input.split("/").last
val lastElementUdf =
HI,
I just want to figure out why the two contexts behavior differently even on
a simple query.
In a netshell, I have a query in which there is a String containing single
quote and casting to Array/Map.
I have tried all the combination of diff type of sql context and query call
api (sql,
Thanks Raghav.
I have 5+ million records. I feel creating multiple come is not an optimal way.
Please suggest any other alternate solution.
Can’t we do any string operation in DF.Select?
Regards,
Raja
From: Raghavendra Pandey
Sent: 11 May 2016 09:04 PM
To: Bharathi Raja
Cc: User
Subject: Re:
Hi Dr Mich,
I will be very keen to have a look at it and review if possible.
Please forward me a copy
Thanking you warmly
On Thursday, 12 May 2016, 11:08, Mich Talebzadeh
wrote:
Hi Al,,
Following the threads in spark forum, I decided to write up on
Hi Al,,
Following the threads in spark forum, I decided to write up on
configuration of Spark including allocation of resources and configuration
of driver, executors, threads, execution of Spark apps and general
troubleshooting taking into account the allocation of resources for Spark
Hi,
Do u know what this message mean?
org.apache.spark.shuffle.FetchFailedException: Failed to connect to
localhost/127.0.0.1:50606
at
org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:323)
at
Hi Amit,
This is very interesting indeed because I have got similar resutls. I tried
doing a filtter + groupBy using DataSet with a function, and using the
inner RDD of the DF(RDD[row]). I used the inner RDD of a DataFrame because
apparently there is no straight-forward way to create an RDD of
Hello,
I have the same problem... Sometimes I get the error: "Py4JError: Answer
from Java side is empty"
Sometimes my code works fine but sometimes not...
Did you find why it might come? What was the reason?
Thanks.
--
View this message in context:
Hello.
I'm sorry but did you find the answer?
I have the similar error and I can not solve it... No one answered me...
Spark driver dies and I get the error "Answer from Java side is empty".
I thought that it is so because I made a mistake this conf-file
I use Sparkling Water 1.6.3, Spark
Hi Guys ,
Does any of you have tried this mechanism before?
I am able to run it locally and get the output ..But how do i submit the
job to the Yarn-Cluster using Spark-JobServer.
Any documentation ?
Regards
Ashesh
--
View this message in context:
I am using a spark cluster on Amazon (launched using
spark-1.6-prebuilt-with-hadoop-2.6 spark-ec2 script)
to run a scala driver application to read S3 object content in parallel.
I have tried “s3n://bucket” with sc.textFile as well as set up an RDD with
the S3 keys and then used
java aws sdk
Hi,
well I guess the advantage of gitter over maling list is the same as with
IRC. It's not actually a replacer because mailing list is also important.
But it is lot easier to build a community around tool with ad-hoc ability
to connect with each other.
I have gitter running on constantly, I
We are on spark 1.5.1
Above change was to add a shutdown hook.
I am not adding shutdown hook in code, so inbuilt shutdown hook is being
called.
Driver signals that it is going to to graceful shutdown, but executor sees
that Driver is dead and it shuts down abruptly.
Could this issue be related to
This is happening because spark context shuts down without shutting down
the ssc first.
This was behavior till spark 1.4 ans was addressed in later releases.
https://github.com/apache/spark/pull/6307
Which version of spark are you on?
Thanks
Deepak
On Thu, May 12, 2016 at 12:14 PM, Rakesh H
Yes, it seems to be the case.
In this case executors should have continued logging values till 300, but
they are shutdown as soon as i do "yarn kill .."
On Thu, May 12, 2016 at 12:11 PM Deepak Sharma
wrote:
> So in your case , the driver is shutting down gracefully ,
So in your case , the driver is shutting down gracefully , but the
executors are not.
IS this the problem?
Thanks
Deepak
On Thu, May 12, 2016 at 11:49 AM, Rakesh H (Marketing Platform-BLR) <
rakes...@flipkart.com> wrote:
> Yes, it is set to true.
> Log of driver :
>
> 16/05/12 10:18:29 ERROR
Yes, it is set to true.
Log of driver :
16/05/12 10:18:29 ERROR yarn.ApplicationMaster: RECEIVED SIGNAL 15: SIGTERM
16/05/12 10:18:29 INFO streaming.StreamingContext: Invoking
stop(stopGracefully=true) from shutdown hook
16/05/12 10:18:29 INFO scheduler.JobGenerator: Stopping JobGenerator
Hi Rakesh
Did you tried setting *spark.streaming.stopGracefullyOnShutdown to true *for
your spark configuration instance?
If not try this , and let us know if this helps.
Thanks
Deepak
On Thu, May 12, 2016 at 11:42 AM, Rakesh H (Marketing Platform-BLR) <
rakes...@flipkart.com> wrote:
> Issue i
Issue i am having is similar to the one mentioned here :
http://stackoverflow.com/questions/36911442/how-to-stop-gracefully-a-spark-streaming-application-on-yarn
I am creating a rdd from sequence of 1 to 300 and creating streaming RDD
out of it.
val rdd = ssc.sparkContext.parallelize(1 to 300)
57 matches
Mail list logo