You are saying the RDD lineage must be serialized, otherwise we could not
recreate it after a node failure. This is false. The RDD lineage is not
serialized. It is only relevant to the driver application and as such it is
just kept in memory in the driver application. If the driver application
Awesome, thanks! It's very helpful for preparing for the migration. Do you
plan to push 2.0.0-preview to Maven too? (I for one would appreciate the
convenience.)
On Wed, May 25, 2016 at 8:44 AM, Reynold Xin wrote:
> In the past the Spark community have created preview
Hi Nitin,
Sorry for waking up this ancient thread. That's a fantastic set of JVM
flags! We just hit the same problem, but we haven't even discovered all
those flags for limiting memory growth. I wanted to ask if you ever
discovered anything further?
I see you also set -XX:NewRatio=3. This is a
I haven't tried this, but I thought you can run the Thriftserver in Spark
and then connect with the HiveServer2 JDBC driver:
http://spark.apache.org/docs/1.6.1/sql-programming-guide.html#running-the-thrift-jdbcodbc-server
On Fri, Mar 25, 2016 at 7:57 AM, Reynold Xin wrote:
There is related discussion in
https://issues.apache.org/jira/browse/SPARK-8836. It's not too hard to
implement this without modifying Spark and we measured ~10x improvement
over plain RDD joins. I haven't benchmarked against DataFrames -- maybe
they also realize this performance advantage.
On
YARN may be a workaround.
On Thu, Feb 18, 2016 at 4:13 PM, Ashish Soni wrote:
> Hi All ,
>
> Just wanted to know if there is any work around or resolution for below
> issue in Stand alone mode
>
> https://issues.apache.org/jira/browse/SPARK-9559
>
> Ashish
>
On Tue, Feb 2, 2016 at 7:10 PM, Michael Armbrust
wrote:
> What about the memory leak bug?
>> https://issues.apache.org/jira/browse/SPARK-11293
>> Even after the memory rewrite in 1.6.0, it still happens in some cases.
>> Will it be fixed for 1.6.1?
>>
>
> I think we have
+1 (non-binding)
It passes our tests after we registered 6 new classes with Kryo:
kryo.register(classOf[org.apache.spark.sql.catalyst.expressions.UnsafeRow])
kryo.register(classOf[Array[org.apache.spark.mllib.tree.model.Split]])
Actions trigger jobs. A job is made up of stages. A stage is made up of
tasks. Executor threads execute tasks.
Does that answer your question?
On Mon, Oct 5, 2015 at 12:52 PM, Guna Prasaad wrote:
> What is the difference between a task and a job in spark and
>
It's already possible to just copy the code from countApproxDistinct
https://github.com/apache/spark/blob/v1.4.0/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L1153
and
access the HLL directly, or do anything you like.
On Wed, Jul 1, 2015 at 5:26 PM, Nick Pentreath nick.pentre...@gmail.com
Hi Vasili,
It so happens that the entire SparkR code was merged to Apache Spark in a
single pull request. So you can see at once all the required changes in
https://github.com/apache/spark/pull/5096. It's 12,043 lines and took more
than 20 people about a year to write as I understand it.
On Mon,
Check out http://stackoverflow.com/a/26051042/3318517. It's a nice method
for saving the RDD into separate files by key in a single pass. Then you
can read the files into separate RDDs.
On Wed, Apr 29, 2015 at 2:10 PM, Juan RodrĂguez Hortalá
juan.rodriguez.hort...@gmail.com wrote:
Hi
12 matches
Mail list logo