I am trying to understand what the data and computation flow is in Spark, and
believe I fairly understand the Shuffle (both map and reduce side), but I do
not get what happens to the computation from the map stages. I know all maps
gets pipelined on the shuffle (when there is no other action in
Hi everyone,
SQLContext.createDataFrame has different behaviour in Scala or Python :
l = [('Alice', 1)]
sqlContext.createDataFrame(l).collect()
[Row(_1=u'Alice', _2=1)]
sqlContext.createDataFrame(l, ['name', 'age']).collect()
[Row(name=u'Alice', age=1)]
and in Scala :
scala val data =
Hi,
I was trying to see if I can make Spark avoid hitting the disk for small
jobs, but I see that the SortShuffleWriter.write() always writes to disk. I
found an older thread (
http://apache-spark-user-list.1001560.n3.nabble.com/How-does-shuffle-work-in-spark-td584.html)
saying that it doesn't
We could build on minimum jdk we support for testing pr's - which will
automatically cause build failures in case code uses newer api ?
Regards,
Mridul
On Fri, May 1, 2015 at 2:46 PM, Reynold Xin r...@databricks.com wrote:
It's really hard to inspect API calls since none of us have the Java
+1
On Sat, May 2, 2015 at 1:09 PM, Mridul Muralidharan mri...@gmail.com
wrote:
We could build on minimum jdk we support for testing pr's - which will
automatically cause build failures in case code uses newer api ?
Regards,
Mridul
On Fri, May 1, 2015 at 2:46 PM, Reynold Xin
Hi Shane,
Since we are still maintaining support for jdk6, jenkins should be
using jdk6 [1] to ensure we do not inadvertently use jdk7 or higher
api which breaks source level compat.
-source and -target is insufficient to ensure api usage is conformant
with the minimum jdk version we are
I've personally prototyped completely in-memory shuffle for Spark 3 times.
However, it is unclear how big of a gain it would be to put all of these in
memory, under newer file systems (ext4, xfs). If the shuffle data is small,
they are still in the file system buffer cache anyway. Note that
Hi,
I’ve posted this problem in user@spark but find no reply, therefore moved to
dev@spark, sorry for duplication.
I am wondering if it is possible to submit, monitor kill spark applications
from another service.
I have wrote a service this:
parse user commands
translate them into
I agree, this is better handled by the filesystem cache - not to
mention, being able to do zero copy writes.
Regards,
Mridul
On Sat, May 2, 2015 at 10:26 PM, Reynold Xin r...@databricks.com wrote:
I've personally prototyped completely in-memory shuffle for Spark 3 times.
However, it is unclear
Part of the reason is that it is really easy to just call toDF on Scala,
and we already have a lot of createDataFrame functions.
(You might find some of the cross-language differences confusing, but I'd
argue most real users just stick to one language, and developers or
trainers are the only ones
It's really hard to inspect API calls since none of us have the Java
standard library in our brain. The only way we can enforce this is to have
it in Jenkins, and Tom you are currently our mini-Jenkins server :)
Joking aside, looks like we should support Java 6 in 1.4, and in the
release notes
To close this thread rxin created a broader Jira to handle window functions
in Dataframes : https://issues.apache.org/jira/browse/SPARK-7322
Thanks everyone.
Le mer. 29 avr. 2015 à 22:51, Olivier Girardot
o.girar...@lateral-thoughts.com a écrit :
To give you a broader idea of the current use
Maybe I can help a bit. What happens when you call .map(my func) is
that you create a MapPartitionsRDD that has a reference to that
closure in it's compute() function. When a job is run (jobs are run as
the result of RDD actions):
that's kinda what we're doing right now, java 7 is the default/standard on
our jenkins.
or, i vote we buy a butler's outfit for thomas and have a second jenkins
instance... ;)
On Sat, May 2, 2015 at 1:09 PM, Mridul Muralidharan mri...@gmail.com
wrote:
We could build on minimum jdk we support
i think i might be misunderstanding, but shouldnt java 6 currently be used
in jenkins?
On Sat, May 2, 2015 at 11:53 PM, shane knapp skn...@berkeley.edu wrote:
that's kinda what we're doing right now, java 7 is the default/standard on
our jenkins.
or, i vote we buy a butler's outfit for
15 matches
Mail list logo