You put your quotes in the wrong place. See
https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools
On Wed, Feb 26, 2014 at 10:04 PM, Bryn Keller xol...@xoltar.org wrote:
Hi Folks,
I've tried using sbt test-only '*PairRDDFunctionsSuite' to run only that
test suite, which
and if I have to work
on a PR, I should rather make use of my github account...
Thanks for the clarification.
On Sat, Mar 1, 2014 at 12:27 PM, Reynold Xin r...@databricks.com
wrote:
I'm not sure what you mean by enterprise stash.
But PR is a concept unique to Github. There is no PR
Take a look at
https://cwiki.apache.org/confluence/display/SPARK/Spark+Internals
On Sat, Mar 15, 2014 at 6:19 PM, David Thomas dt5434...@gmail.com wrote:
Is there any documentation available that explains the code architecture
that can help a new Spark framework developer?
It's mostly stock CentOS installation with some scripts.
On Thu, Mar 20, 2014 at 2:53 AM, Usman Ghani us...@platfora.com wrote:
Is there anything special about the spark AMIs or are they just stock
CentOS installations?
was that job (I guess in terms of number of transforms and
actions) and how long did that take to process?
-Suren
On Thu, Mar 20, 2014 at 2:08 PM, Reynold Xin r...@databricks.com wrote:
Actually we just ran a job with 70TB+ compressed data on 28 worker nodes
-
I didn't count the size
Nick and Koert summarized it pretty well. Just to clarify and give some
concrete examples.
If you want to start with a specific vertex, and follow some path, it is
probably easier and faster to use some key values store or even MySQL or a
graph database.
If you want to count the average length
Thanks for contributing!
I think often unless the feature is gigantic, you can send a pull request
directly for discussion. One rule of thumb in the Spark code base is that
we typically prefer readability over conciseness, and thus we tend to avoid
using too much Scala magic or operator
I added the config option to use the non-default serializer. However, at
the time, Kryo fails serializing pretty much any closures so that option
was never really used / recommended.
Since then the Scala ecosystem has developed, and some other projects are
starting to use Kryo to serialize more
/TaskResultGetter.scala#L39
Would storing my RDD as MEMORY_ONLY_SER prevent the closure serializer from
trying to deal with my clojure.lang.PeristentVector class?
Where do I go from here?
On Sun, May 4, 2014 at 12:50 PM, Reynold Xin r...@databricks.com wrote:
I added the config option to use
as MEMORY_ONLY_SER prevent the closure serializer
from
trying to deal with my clojure.lang.PeristentVector class?
Where do I go from here?
On Sun, May 4, 2014 at 12:50 PM, Reynold Xin r...@databricks.com
wrote:
I added the config option to use the non-default serializer. However
The main reason is that it doesn't always work (e.g. sometimes application
program has special serialization / externalization written already for
Java which don't work in Kryo).
On Mon, May 12, 2014 at 5:47 PM, Anand Avati av...@gluster.org wrote:
Hi,
Can someone share the reason why Kryo
Thanks for the experiments and analysis!
I think Michael already submitted a patch that avoids scanning all columns
for count(*) or count(1).
On Mon, May 12, 2014 at 9:46 PM, Andrew Ash and...@andrewash.com wrote:
Hi Spark devs,
First of all, huge congrats on the parquet integration with
Thanks for pointing it out. We should update the website to fix the code.
val count = spark.parallelize(1 to NUM_SAMPLES).map { i =
val x = Math.random()
val y = Math.random()
if (x*x + y*y 1) 1 else 0
}.reduce(_ + _)
println(Pi is roughly + 4.0 * count / NUM_SAMPLES)
On Fri, May 16,
This was an optimization that reuses a triplet object in GraphX, and when
you do a collect directly on triplets, the same object is returned.
It has been fixed in Spark 1.0 here:
https://issues.apache.org/jira/browse/SPARK-1188
To work around in older version of Spark, you can add a copy step to
reduce always return a single element - maybe you are misunderstanding what
the reduce function in collections does.
On Mon, May 19, 2014 at 3:32 PM, GlennStrycker glenn.stryc...@gmail.comwrote:
I tried adding .copy() everywhere, but still only get one element returned,
not even an RDD
You are probably looking for reduceByKey in that case.
reduce just reduces everything in the collection into a single element.
On Tue, May 20, 2014 at 12:16 PM, GlennStrycker glenn.stryc...@gmail.comwrote:
Wait a minute... doesn't a reduce function return 1 element PER key pair?
For example,
You can submit a pull request on the github mirror:
https://github.com/apache/spark
Thanks.
On Wed, May 21, 2014 at 10:59 PM, npanj nitinp...@gmail.com wrote:
Hi,
For my project I needed to load a graph with edge weight; for this I have
updated GraphLoader.edgeListFile to consider third
Thanks for sending this in.
The ASF list doesn't support html so the formatting of the code is a little
messed up. For those who want to see the code in clearly formatted text, go
to
http://apache-spark-developers-list.1001551.n3.nabble.com/Kryo-serialization-for-closures-a-workaround-tp6787.html
Would you like to submit a pull request to update it?
Also in the latest version HyperLogLog is serializable. That means we can
get rid of the SerializableHyperLogLog class. (and move to use
HyperLogLogPlus).
On Tue, May 27, 2014 at 3:01 PM, Surendranauth Hiraman
suren.hira...@velos.io
On Tue, May 27, 2014 at 6:02 PM, Reynold Xin r...@databricks.com wrote:
Would you like to submit a pull request to update it?
Also in the latest version HyperLogLog is serializable. That means we can
get rid of the SerializableHyperLogLog class. (and move to use
HyperLogLogPlus
properly and I'm
having various dependency issues running sbt/sbt assembly.
Any change you could go ahead and submit a pull request for this if it's
easy for you? :-)
-Suren
On Tue, May 27, 2014 at 6:13 PM, Reynold Xin r...@databricks.com wrote:
2.7 sounds good. I was actually waiting for 2.7
It is actually pretty simple. You will first need to fork Spark on github,
and then push your changes to it, and then follow:
https://help.github.com/articles/using-pull-requests
On Tue, May 27, 2014 at 6:10 PM, innowireless TaeYun Kim
taeyun@innowireless.co.kr wrote:
I'm afraid I don't
Take a look at this one: https://issues.apache.org/jira/browse/SPARK-1188
It was an optimization that added user inconvenience. We got rid of that
now in Spark 1.0.
On Wed, May 28, 2014 at 11:48 PM, Michael Malak michaelma...@yahoo.comwrote:
Shouldn't I be seeing N2 and N4 in the output
Can you take a look at the latest Spark 1.0 docs and see if they are fixed?
https://github.com/apache/spark/tree/master/docs
Thanks.
On Thu, May 29, 2014 at 5:29 AM, Lizhengbing (bing, BIPA)
zhengbing...@huawei.com wrote:
The instruction address is in
I think the main concern is this would require scanning the data twice, and
maybe the user should be aware of it ...
On Thu, Jun 5, 2014 at 10:29 AM, Andrew Ash and...@andrewash.com wrote:
I have a use case that would greatly benefit from RDDs having a .scanLeft()
method. Are the project
If you are interested in openstack/swift integration with Spark, please
drop me a line. We are looking into improving the integration.
Thanks.
Thanks for sending the update. Do you mind posting a link to the bug
reported in the lzf project here as well? Cheers.
On Sun, Jun 15, 2014 at 7:04 PM, gchen chenguanch...@gmail.com wrote:
To anyone who is interested in this issue, the root cause if from a third
party code
I think you guys are / will be leading the effort on that :)
On Mon, Jun 16, 2014 at 4:15 PM, gchen chenguanch...@gmail.com wrote:
Hi Reynold, thanks for your interest on this issue. The work here is part
of
incorporating Spark into PowerLinux ecosystem.
Here is the bug raised in ning by
It is here:
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/io/CompressionCodec.scala
On Mon, Jun 16, 2014 at 4:26 PM, gchen chenguanch...@gmail.com wrote:
I didn't find ning's source code in Spark git repository (or maybe I missed
it?), so next time when we
It is actually pluggable. You can implement new compression codecs and just
change the config variable to use those.
On Tuesday, June 17, 2014, gchen chenguanch...@gmail.com wrote:
Cool, so maybe when we swith to Snappy instead of LZF, we can workaround
the
bug until the LZF upstream fix it,
Hi Michael,
Unfortunately the Apache mailing list filters out attachments. That said,
you can usually just start by looking at the JIRA for Spark and find issues
tagged with the starter tag and work on them. You can submit pull requests
to the github repo or email the dev list for feedbacks on
Thanks for the message.
There is an open issue about the public type / schema system that is
related to this topic: https://issues.apache.org/jira/browse/SPARK-2179
You probably want to comment on that ticket as well.
On Sat, Jun 21, 2014 at 7:52 AM, guxiaobo1982 guxiaobo1...@qq.com wrote:
Mridul,
Can you comment a little bit more on this issue? We are running into the
same stack trace but not sure whether it is just different Spark versions
on each cluster (doesn't seem likely) or a bug in Spark.
Thanks.
On Sat, May 17, 2014 at 4:41 AM, Mridul Muralidharan mri...@gmail.com
IntelliJ parser/analyzer/compiler behaves differently from Scala compiler,
and sometimes lead to inconsistent behavior. This is one of the case.
In general while we use IntelliJ, we don't use it to build stuff. I
personally always build in command line with sbt or Maven.
On Thu, Jun 26, 2014
Responded on the jira...
On Thu, Jun 26, 2014 at 9:17 PM, Bharath Ravi Kumar reachb...@gmail.com
wrote:
Hi,
I've been encountering a NPE invoking reduceByKey on JavaPairRDD since
upgrading to 1.0.0 . The issue is straightforward to reproduce with 1.0.0
and doesn't occur with 0.9.0. The
I was actually talking to tgraves today at the summit about this.
Based on my understanding, the sizes we track and send (which is
unfortunately O(M*R) regardless of how we change the implementation --
whether we send via task or send via MapOutputTracker) is only used to
compute maxBytesInFlight
Yes it would be great to mention the JIRA ticket number on the pull
request. Thanks!
On Wed, Jul 2, 2014 at 1:01 AM, Eustache DIEMERT eusta...@diemert.fr
wrote:
Hi there,
I just created an issue [1] for MLlib on Jira. I also want to contribute a
fix, is it a good idea to submit a PR on
This blog post probably clarifies a lot of things:
http://databricks.com/blog/2014/07/01/shark-spark-sql-hive-on-spark-and-the-future-of-sql-on-spark.html
On Tue, Jul 8, 2014 at 12:24 PM, anishs...@yahoo.co.in
anishs...@yahoo.co.in wrote:
Hi All
I read somewhere that Cloudera announced
Maybe it's time to create an advanced mode in the ui.
On Wed, Jul 9, 2014 at 12:23 PM, Kay Ousterhout k...@eecs.berkeley.edu
wrote:
Hi all,
I've been doing a bunch of performance measurement of Spark and, as part of
doing this, added metrics that record the average CPU utilization, disk
Also take a look at this:
https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals
On Fri, Jul 11, 2014 at 10:29 AM, Andrew Or and...@databricks.com wrote:
Hi Egor,
Here are a few answers to your questions:
1) Python needs to be installed on all machines, but not pyspark. The
Ian,
The LZFOutputStream's large byte buffer is sort of annoying. It is much
smaller if you use the Snappy one. The downside of the Snappy one is
slightly less compression (I've seen 10 - 20% larger sizes).
If we can find a compression scheme implementation that doesn't do very
large buffers,
Hi Spark devs,
I was looking into the memory usage of shuffle and one annoying thing is
the default compression codec (LZF) is that the implementation we use
allocates buffers pretty generously. I did a simple experiment and found
that creating 1000 LZFOutputStream allocated 198976424 bytes
Copying Jon here since he worked on the lzf library at Ning.
Jon - any comments on this topic?
On Mon, Jul 14, 2014 at 3:54 PM, Matei Zaharia matei.zaha...@gmail.com
wrote:
You can actually turn off shuffle compression by setting
spark.shuffle.compress to false. Try that out, there will
of an algorithm, or multiple map functions, or stuff like that).
But they won't have to broadcast something they only use once.
Matei
On Jul 16, 2014, at 10:07 PM, Reynold Xin r...@databricks.com wrote:
Oops - the pull request should be
https://github.com/apache/spark/pull/1452
On Wed, Jul 16
+1
On Thursday, July 17, 2014, Matei Zaharia matei.zaha...@gmail.com wrote:
+1
Tested on Mac, verified CHANGES.txt is good, verified several of the bug
fixes.
Matei
On Jul 17, 2014, at 11:12 AM, Xiangrui Meng men...@gmail.com
javascript:; wrote:
I start the voting with a +1.
Ran
Thanks :)
FYI the pull request has been merged and will be part of Spark 1.1.0.
On Thu, Jul 17, 2014 at 11:09 AM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:
On Thu, Jul 17, 2014 at 1:23 AM, Stephen Haberman
stephen.haber...@gmail.com wrote:
I'd be ecstatic if more major changes
I added an automated testing section:
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-AutomatedTesting
Can you take a look to see if it is what you had in mind?
On Mon, Jul 21, 2014 at 3:54 PM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:
”. Someone contributing to
PySpark will want to be directed to run something in addition to (or
instead of) sbt/sbt test, I believe.
Nick
On Mon, Jul 21, 2014 at 11:43 PM, Reynold Xin r...@databricks.com wrote:
I added an automated testing section:
https://cwiki.apache.org/confluence
Thanks for the thoughtful email, Neil and Christopher.
If I understand this correctly, it seems like the dynamic variable is just
a variant of the accumulator (a static one since it is a global object).
Accumulators are already implemented using thread-local variables under the
hood. Am I
If the purpose is for dropping csv headers, perhaps we don't really need a
common drop and only one that drops the first line in a file? I'd really
try hard to avoid a common drop/dropWhile because they can be expensive to
do.
Note that I think we will be adding this functionality (ignoring
...@cloudera.com
wrote:
It could make sense to add a skipHeader argument to SparkContext.textFile?
On Mon, Jul 21, 2014 at 10:37 PM, Reynold Xin r...@databricks.com wrote:
If the purpose is for dropping csv headers, perhaps we don't really need
a
common drop and only one that drops the first
, Reynold Xin r...@databricks.com wrote:
There is one piece of information that'd be useful to know, which is the
source of the input. Even in the presence of an IOException, the input
metrics still specifies the task is reading from Hadoop.
However, I'm slightly confused by this -- I think
To run through all the tests you'd need to create the assembly jar first.
I've seen this asked a few times. Maybe we should make it more obvious.
http://spark.apache.org/docs/latest/building-with-maven.html
Spark Tests in Maven
Tests are run by default via the ScalaTest Maven plugin
-Pyarn -Phadoop-2.3 -Phive test
AFA documentation, yes adding another sentence to that same Building with
Maven page would likely be helpful to future generations.
2014-07-27 19:10 GMT-07:00 Reynold Xin r...@databricks.com:
To run through all the tests you'd need to create the assembly jar
You can use publish-local in sbt.
If you want to be more careful, you can give Spark a different version
number and use that version number in your app.
On Mon, Jul 28, 2014 at 4:33 AM, Larry Xiao xia...@sjtu.edu.cn wrote:
Hi,
How do you package an app with modified spark?
In seems sbt
Hi devs,
I don't know if this is going to help, but if you can watch vote on the
ticket, it might help ASF INFRA prioritize and triage it faster:
https://issues.apache.org/jira/browse/INFRA-8116
Please do. Thanks!
On Mon, Jul 28, 2014 at 5:41 PM, Patrick Wendell pwend...@gmail.com wrote:
Would something like this help?
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/PartitionPruningRDD.scala
On Thu, Jul 24, 2014 at 8:40 AM, Eugene Cheipesh echeip...@gmail.com
wrote:
Hello,
I have an interesting use case for a pre-filtered RDD. I have
Message-
From: Reynold Xin [mailto:r...@databricks.com]
Sent: Tuesday, July 29, 2014 12:55 AM
To: dev@spark.apache.org
Subject: Re: pre-filtered hadoop RDD use case
Would something like this help?
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd
+1 on this.
On Tue, Jul 29, 2014 at 4:34 PM, Mark Hamstra m...@clearstorydata.com
wrote:
Of late, I've been coming across quite a few pull requests and associated
JIRA issues that contain nothing indicating their purpose beyond a pretty
minimal description of what the pull request does. On
Thanks for your interest.
I think the main challenge is if we have to call Python functions per
record, it can be pretty expensive to serialize/deserialize across
boundaries of the Python process and JVM process. I don't know if there is
a good way to solve this problem yet.
On Fri, Aug 1,
I'm pretty sure it is an oversight. Would you like to submit a pull request
to fix that?
On Tue, Aug 5, 2014 at 12:14 PM, Stephen Boesch java...@gmail.com wrote:
Within its compute.close method, the JdbcRDD class has this interesting
logic for closing jdbc connection:
try {
for another reason, not intending to be a
bother ;)
2014-08-05 13:03 GMT-07:00 Reynold Xin r...@databricks.com:
I'm pretty sure it is an oversight. Would you like to submit a pull
request to fix that?
On Tue, Aug 5, 2014 at 12:14 PM, Stephen Boesch java...@gmail.com
wrote:
Within
it.
As for the leaking in the case of malformed statements, isn't that
addressed by
context.addOnCompleteCallback{ () = closeIfNeeded() }
or am I misunderstanding?
On Tue, Aug 5, 2014 at 3:15 PM, Reynold Xin r...@databricks.com wrote:
Thanks. Those are definitely great problems to fix!
On Tue, Aug 5
I don't think it was a conscious design decision to not include the
application classes in the connection manager serializer. We should fix
that. Where is it deserializing data in that thread?
4 might make sense in the long run, but it adds a lot of complexity to the
code base (whole separate
ScalaTest actually has support for parallelization built-in. We can use
that.
The main challenge is to make sure all the test suites can work in parallel
when running along side each other.
On Fri, Aug 8, 2014 at 9:47 AM, Ted Yu yuzhih...@gmail.com wrote:
How about using parallel execution
://github.com/apache/spark/blob/master/project/SparkBuild.scala#L350
On Fri, Aug 8, 2014 at 10:10 AM, Reynold Xin r...@databricks.com wrote:
ScalaTest actually has support for parallelization built-in. We can use
that.
The main challenge is to make sure all the test suites can work in
parallel when
Looks like you didn't actually paste the exception message. Do you mind
doing that?
On Fri, Aug 8, 2014 at 10:14 AM, Reynold Xin r...@databricks.com wrote:
Pasting a better formatted trace:
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1180
Pasting a better formatted trace:
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1180)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347)
at
scala.collection.mutable.HashMap$$anonfun$writeObject$1.apply(HashMap.scala:137)
at
, Reynold Xin r...@databricks.com
wrote:
Looks like you didn't actually paste the exception message. Do you mind
doing that?
On Fri, Aug 8, 2014 at 10:14 AM, Reynold Xin r...@databricks.com wrote:
Pasting a better formatted trace:
at java.io.ObjectOutputStream.writeObject0
I created a JIRA ticket to track this:
https://issues.apache.org/jira/browse/SPARK-2928
Let me know if you need help with it.
On Fri, Aug 8, 2014 at 10:40 AM, Reynold Xin r...@databricks.com wrote:
Yes, I'm pretty sure it doesn't actually use the right serializer in
TorrentBroadcast:
https
.
I can compare Spark-1.0.1 code and see what's going on...
Thanks,
Ron
On Friday, August 8, 2014 10:43 AM, Reynold Xin r...@databricks.com
wrote:
I created a JIRA ticket to track this:
https://issues.apache.org/jira/browse/SPARK-2928
Let me know if you need help with it.
On Fri
They only compared their own implementations of couple algorithms on
different platforms rather than comparing the different platforms
themselves (in the case of Spark -- PySpark). I can write two variants of
an algorithm on Spark and make them perform drastically differently.
I have no doubt if
, Reynold Xin r...@databricks.com wrote:
They only compared their own implementations of couple algorithms on
different platforms rather than comparing the different platforms
themselves (in the case of Spark -- PySpark). I can write two variants of
an algorithm on Spark and make them perform
BTW you can find the original Presto (rebranded as Distributed R) paper
here:
http://eurosys2013.tudos.org/wp-content/uploads/2013/paper/Venkataraman.pdf
On Wed, Aug 13, 2014 at 2:16 PM, Reynold Xin r...@databricks.com wrote:
Actually I believe the same person started both projects
I haven't read the code yet, but if it is what I think it is, this is
SUPER, UBER, HUGELY useful.
On a related note, I asked about this on the Scala dev list but never got a
satisfactory answer
https://groups.google.com/forum/#!msg/scala-internals/_cZ1pK7q6cU/xyBQA0DdcYwJ
On Wed, Aug 13,
Hi devs,
I posted a design doc proposing an interface for pluggable block transfer
(used in shuffle, broadcast, block replication, etc). This is expected to
be done in 1.2 time frame.
It should make our code base cleaner, and enable us to provide alternative
implementations of block transfers
the deserialisation to happen on that thread. See
MemoryStore.scala:102.
On 7 August 2014 11:53, Reynold Xin r...@databricks.com wrote:
I don't think it was a conscious design decision to not include the
application classes in the connection manager serializer. We should
fix
. The above approach wouldn't help with this problem.
Additionally, the YARN scheduler currently uses this approach of adding
the application jar to the Executor classpath, so it would make things a
bit more uniform.
Cheers,
Graham
On 14 August 2014 17:37, Reynold Xin r...@databricks.com
I believe docs changes can go in anytime (because we can just publish new
versions of docs).
Critical bug fixes can still go in too.
On Thu, Aug 21, 2014 at 11:43 PM, Evan Chan velvia.git...@gmail.com wrote:
I'm hoping to get in some doc enhancements and small bug fixes for Spark
SQL.
Also
Great idea. Added the link
https://github.com/apache/spark/blob/master/README.md
On Thu, Aug 21, 2014 at 4:06 PM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:
We should add this link to the readme on GitHub btw.
2014년 8월 21일 목요일, Henry Saputrahenry.sapu...@gmail.com님이 작성한 메시지:
The
Hi Rajendran,
I'm assuming you have some concept of schema and you are intending to
integrate with SchemaRDD instead of normal RDDs.
More responses inline below.
On Fri, Aug 22, 2014 at 2:21 AM, Rajendran Appavu appra...@in.ibm.com
wrote:
I am new to Spark source code and looking to see if
Linking to the JIRA tracking APIs to hook into the planner:
https://issues.apache.org/jira/browse/SPARK-3248
On Wed, Aug 27, 2014 at 1:56 PM, Reynold Xin r...@databricks.com wrote:
Hi Rajendran,
I'm assuming you have some concept of schema and you are intending to
integrate with SchemaRDD
Thanks for doing this, Shane.
On Thursday, August 28, 2014, shane knapp skn...@berkeley.edu wrote:
all clear: jenkins and all plugins have been updated!
On Thu, Aug 28, 2014 at 7:51 AM, shane knapp skn...@berkeley.edu
javascript:; wrote:
jenkins is upgraded, but a few jobs sneaked in
Sending the response back to the dev list so this is indexable and
searchable by others.
-- Forwarded message --
From: Milos Nikolic milos.nikoli...@gmail.com
Date: Sat, Aug 30, 2014 at 5:50 PM
Subject: Re: Partitioning strategy changed in Spark 1.0.x?
To: Reynold Xin r
Welcome, Shane!
On Tuesday, September 2, 2014, shane knapp skn...@berkeley.edu wrote:
so, i had a meeting w/the databricks guys on friday and they recommended i
send an email out to the list to say 'hi' and give you guys a quick intro.
:)
hi! i'm shane knapp, the new AMPLab devops
Having a SSD help tremendously with assembly time.
Without that, you can do the following in order for Spark to pick up the
compiled classes before assembly at runtime.
export SPARK_PREPEND_CLASSES=true
On Tue, Sep 2, 2014 at 9:10 AM, Sandy Ryza sandy.r...@cloudera.com wrote:
This doesn't
I think in general that is fine. It would be great if your slides come with
proper attribution.
On Tue, Sep 2, 2014 at 3:31 PM, Sanghoon Lee phoenixl...@gmail.com wrote:
Hi, I am phoenixlee and a Spark programmer in Korea.
And be a good chance this time, it tries to teach college students
+1
On Tue, Sep 2, 2014 at 3:08 PM, Cheng Lian lian.cs@gmail.com wrote:
+1
- Tested Thrift server and SQL CLI locally on OSX 10.9.
- Checked datanucleus dependencies in distribution tarball built by
make-distribution.sh without SPARK_HIVE defined.
On Tue, Sep 2, 2014 at
+1
Tested locally on Mac OS X with local-cluster mode.
On Wed, Sep 3, 2014 at 12:24 AM, Patrick Wendell pwend...@gmail.com wrote:
I'll kick it off with a +1
On Wed, Sep 3, 2014 at 12:24 AM, Patrick Wendell pwend...@gmail.com
wrote:
Please vote on releasing the following candidate as
be willing to put some creative commons license
information on the site and its content?
best,
matt
On 09/02/2014 06:32 PM, Reynold Xin wrote:
I think in general that is fine. It would be great if your slides come
with
proper attribution.
On Tue, Sep 2, 2014 at 3:31 PM, Sanghoon Lee phoenixl
that would require github hooks permission and unfortunately asf infra
wouldn't allow that.
Maybe they will change their mind one day, but so far we asked about this
and the answer has been no for security reasons.
On Saturday, September 6, 2014, Nicholas Chammas nicholas.cham...@gmail.com
Can you be a little bit more specific, maybe give a code snippet?
On Tue, Sep 9, 2014 at 5:14 PM, Sudershan Malpani
sudershan.malp...@gmail.com wrote:
Hi all,
I am calling an object which in turn is calling a method inside a map RDD
in spark. While writing the tests how can I mock that
I don't think so. We should probably add a line to log it.
On Thursday, September 11, 2014, Sandy Ryza sandy.r...@cloudera.com wrote:
After the change to broadcast all task data, is there any easy way to
discover the serialized size of the data getting sent down for a task?
thanks,
-Sandy
I didn't know about that
On Thu, Sep 11, 2014 at 6:29 PM, Sandy Ryza sandy.r...@cloudera.com wrote:
It used to be available on the UI, no?
On Thu, Sep 11, 2014 at 6:26 PM, Reynold Xin r...@databricks.com wrote:
I don't think so. We should probably add a line to log
at 6:33 PM, Reynold Xin r...@databricks.com
javascript:_e(%7B%7D,'cvml','r...@databricks.com'); wrote:
I didn't know about that
On Thu, Sep 11, 2014 at 6:29 PM, Sandy Ryza sandy.r...@cloudera.com
javascript:_e(%7B%7D,'cvml','sandy.r...@cloudera.com'); wrote:
It used to be available
Thanks for the email, Erik.
The Scala collection library implementation is a complicated beast ...
On Sat, Sep 6, 2014 at 8:27 AM, Erik Erlandson e...@redhat.com wrote:
I tripped over this recently while preparing a solution for SPARK-3250
(efficient sampling):
Iterator 'drop' method has a
Xiangrui can comment more, but I believe Joseph and him are actually
working on standardize interface and pipeline feature for 1.2 release.
On Fri, Sep 12, 2014 at 8:20 AM, Egor Pahomov pahomov.e...@gmail.com
wrote:
Some architect suggestions on this matter -
I like that idea, but the load on Jenkins isn't very high. The more
complexity we add to the test script, the easier it is to screw it up (at
some point we would need to add unit tests for the build scripts).
Maybe we can just add the message part, so it becomes clear that a pull
request does not
I'm not familiar with Infiniband, but I can chime in on the Spark part.
There are two kinds of communications in Spark: control plane and data
plane. Task scheduling / dispatching is control, whereas fetching a block
(e.g. shuffle) is data.
On Tue, Sep 16, 2014 at 4:22 PM, Trident
This is during shutdown right? Looks ok to me since connections are being
closed. We could've handle this more gracefully, but the logs look
harmless.
On Wednesday, September 17, 2014, wyphao.2007 wyphao.2...@163.com wrote:
Hi, When I run spark job on yarn,and the job finished success,but I
1 - 100 of 1256 matches
Mail list logo