[
https://issues.apache.org/jira/browse/SPARK-2661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-2661:
-
Fix Version/s: 1.1.0
Unpersist last RDD in bagel iteration
[
https://issues.apache.org/jira/browse/SPARK-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-2047.
--
Resolution: Fixed
Fix Version/s: 1.1.0
Use less memory
[
https://issues.apache.org/jira/browse/SPARK-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-2047:
-
Assignee: Aaron Davidson
Use less memory in AppendOnlyMap.destructiveSortedIterator
Is the first() being computed locally on the driver program? Maybe it's to hard
to compute with the memory, etc available there. Take a look at the driver's
log and see whether it has the message Computing the requested partition
locally.
Matei
On Jul 22, 2014, at 12:04 PM, Nathan Kronenfeld
[
https://issues.apache.org/jira/browse/SPARK-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-2494:
-
Priority: Major (was: Blocker)
Hash of None is different cross machines in CPython
[
https://issues.apache.org/jira/browse/SPARK-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-2494:
-
Affects Version/s: 0.9.2
0.9.0
0.9.1
Hash of None
[
https://issues.apache.org/jira/browse/SPARK-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-2494:
-
Fix Version/s: (was: 1.0.1)
(was: 1.0.0)
0.9.3
[
https://issues.apache.org/jira/browse/SPARK-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-2494:
-
Target Version/s: 1.1.0, 1.0.2, 0.9.3 (was: 1.1.0, 1.0.2)
Hash of None is different cross
[
https://issues.apache.org/jira/browse/SPARK-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-2494:
-
Assignee: Davies Liu
Hash of None is different cross machines in CPython
Actually the script in the master branch is also broken (it's pointing to an
older AMI). Try 1.0.1 for launching clusters.
On Jul 20, 2014, at 2:25 PM, Chris DuBois chris.dub...@gmail.com wrote:
I pulled the latest last night. I'm on commit 4da01e3.
On Sun, Jul 20, 2014 at 2:08 PM, Matei
Is this with the 1.0.0 scripts? I believe it's fixed in 1.0.1.
Matei
On Jul 20, 2014, at 1:22 AM, Chris DuBois chris.dub...@gmail.com wrote:
Using the spark-ec2 script with m3.2xlarge instances seems to not have /mnt
and /mnt2 pointing to the 80gb SSDs that come with that instance. Does
[
https://issues.apache.org/jira/browse/SPARK-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-2553:
-
Assignee: Sandy Ryza
CoGroupedRDD unnecessarily allocates a Tuple2 per dep per key
[
https://issues.apache.org/jira/browse/SPARK-2553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-2553.
--
Resolution: Fixed
Target Version/s: 1.1.0
CoGroupedRDD unnecessarily allocates
[
https://issues.apache.org/jira/browse/SPARK-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia reassigned SPARK-2045:
Assignee: Matei Zaharia
Sort-based shuffle implementation
Matei Zaharia created SPARK-2558:
Summary: Mention --queue argument in YARN documentation
Key: SPARK-2558
URL: https://issues.apache.org/jira/browse/SPARK-2558
Project: Spark
Issue Type
[
https://issues.apache.org/jira/browse/SPARK-2558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-2558:
-
Labels: Starter (was: )
Mention --queue argument in YARN documentation
Hi Kamal,
This is not what preservesPartitioning does -- actually what it means is that
if the RDD has a Partitioner set (which means it's an RDD of key-value pairs
and the keys are grouped into a known way, e.g. hashed or range-partitioned),
your map function is not changing the partition of
It's possible using the --queue argument of spark-submit. Unfortunately this is
not documented on http://spark.apache.org/docs/latest/running-on-yarn.html but
it appears if you just type spark-submit --help or spark-submit with no
arguments.
Matei
On Jul 17, 2014, at 2:33 AM, Konstantin
Good question.. I'll ask INFRA because I haven't seen other Apache mailing
lists provide this. It would indeed be helpful.
Matei
On Jul 17, 2014, at 12:59 PM, Nick Chammas nicholas.cham...@gmail.com wrote:
Can we modify the mailing list to include permalinks to the thread in the
footer of
[
https://issues.apache.org/jira/browse/SPARK-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-2048:
-
Description:
In the external spilling code in ExternalAppendOnlyMap and CoGroupedRDD
[
https://issues.apache.org/jira/browse/SPARK-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064602#comment-14064602
]
Matei Zaharia commented on SPARK-2048:
--
I added one more issue to this BTW, about
Hey Reynold, just to clarify, users will still have to manually broadcast
objects that they want to use *across* operations (e.g. in multiple iterations
of an algorithm, or multiple map functions, or stuff like that). But they won't
have to broadcast something they only use once.
Matei
On Jul
Yeah, we try to have a regular 3 month release cycle; see
https://cwiki.apache.org/confluence/display/SPARK/Wiki+Homepage for the current
window.
Matei
On Jul 16, 2014, at 4:21 PM, Mark Hamstra m...@clearstorydata.com wrote:
You should expect master to compile and run: patches aren't merged
[
https://issues.apache.org/jira/browse/SPARK-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-2045:
-
Attachment: (was: Sort-basedshuffledesign.pdf)
Sort-based shuffle implementation
[
https://issues.apache.org/jira/browse/SPARK-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-2045:
-
Attachment: Sort-basedshuffledesign.pdf
I've posted a design doc for a simple version
[
https://issues.apache.org/jira/browse/SPARK-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-2045:
-
Attachment: Sort-basedshuffledesign.pdf
Oops, attached the wrong file before. Here's the right
[
https://issues.apache.org/jira/browse/SPARK-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14063009#comment-14063009
]
Matei Zaharia commented on SPARK-2045:
--
Right now I was thinking it would happen
Yeah, that seems like something we can inline :).
On Jul 15, 2014, at 7:30 PM, Baofeng Zhang pelickzh...@qq.com wrote:
Is Matei following this?
Catalyst uses the Utils to get the ClassLoader which loaded Spark.
Can Catalyst directly do getClass.getClassLoader to avoid the dependency
on
Hi Nathan,
I think there are two possible reasons for this. One is that even though you
are caching RDDs, their lineage chain gets longer and longer, and thus
serializing each RDD takes more time. You can cut off the chain by using
RDD.checkpoint() periodically, say every 5-10 iterations. The
Yeah, this is handled by the commit call of the FileOutputFormat. In general
Hadoop OutputFormats have a concept called committing the output, which you
should do only once per partition. In the file ones it does an atomic rename to
make sure that the final output is a complete file.
Matei
On
I haven't seen issues using the JVM's own tools (jstack, jmap, hprof and such),
so maybe there's a problem in YourKit or in your release of the JVM. Otherwise
I'd suggest increasing the heap size of the unit tests a bit (you can do this
in the SBT build file). Maybe they are very close to full
Yeah, I'd just add a spark-util that has these things.
Matei
On Jul 14, 2014, at 1:04 PM, Michael Armbrust mich...@databricks.com wrote:
Yeah, sadly this dependency was introduced when someone consolidated the
logging infrastructure. However, the dependency should be very small and
thus
You can actually turn off shuffle compression by setting spark.shuffle.compress
to false. Try that out, there will still be some buffers for the various
OutputStreams, but they should be smaller.
Matei
On Jul 14, 2014, at 3:30 PM, Stephen Haberman stephen.haber...@gmail.com
wrote:
Just a
You currently can't use SparkContext inside a Spark task, so in this case you'd
have to call some kind of local K-means library. One example you can try to use
is Weka (http://www.cs.waikato.ac.nz/ml/weka/). You can then load your text
files as an RDD of strings with SparkContext.wholeTextFiles
Are you increasing the number of parallel tasks with cores as well? With more
tasks there will be more data communicated and hence more calls to these
functions.
Unfortunately contention is kind of hard to measure, since often the result is
that you see many cores idle as they're waiting on a
I think coalesce with shuffle=true will force it to have one task per node.
Without that, it might be that due to data locality it decides to launch
multiple ones on the same node even though the total # of tasks is equal to the
# of nodes.
If this is the *only* thing you run on the cluster,
Yeah, I'd just add a spark-util that has these things.
Matei
On Jul 14, 2014, at 1:04 PM, Michael Armbrust mich...@databricks.com wrote:
Yeah, sadly this dependency was introduced when someone consolidated the
logging infrastructure. However, the dependency should be very small and
thus
The script should be there, in the spark/bin directory. What command did you
use to launch the cluster?
Matei
On Jul 14, 2014, at 1:12 PM, Josh Happoldt josh.happo...@trueffect.com wrote:
Hi All,
I've used the spark-ec2 scripts to build a simple 1.0.1 Standalone cluster on
EC2. It
of them
here, but if your file is big it will also have at least one task per 32 MB
block of the file.
Matei
On Jul 14, 2014, at 6:39 PM, Matei Zaharia matei.zaha...@gmail.com wrote:
I see, so here might be the problem. With more cores, there's less memory
available per core, and now many of your
You can change this setting through SparkContext.hadoopConfiguration, or put
the conf/ directory of your Hadoop installation on the CLASSPATH when you
launch your app so that it reads the config values from there.
Matei
On Jul 14, 2014, at 8:06 PM, valgrind_girl 124411...@qq.com wrote:
eager
The UI code is the same in both, but one possibility is that your executors
were given less memory on YARN. Can you check that? Or otherwise, how do you
know that some RDDs were cached?
Matei
On Jul 12, 2014, at 4:12 PM, Koert Kuipers ko...@tresata.com wrote:
hey shuo,
so far all stage
Unless you can diagnose the problem quickly, Gary, I think we need to go ahead
with this release as is. This release didn't touch the Mesos support as far as
I know, so the problem might be a nondeterministic issue with your application.
But on the other hand the release does fix some critical
Thanks for catching this. For now you can just access the page through http://
instead of https:// to avoid this.
Matei
On Jul 8, 2014, at 10:46 PM, binbinbin915 binbinbin...@live.cn wrote:
https://spark.apache.org/docs/latest/mllib-linear-methods.html#logistic-regression
on Chrome 35 with
They are for CDH4 without YARN, since YARN is experimental in that. You can
download one of the Hadoop 2 packages if you want to run on YARN. Or you might
have to build specifically against CDH4's version of YARN if that doesn't work.
Matei
On Jul 7, 2014, at 9:37 PM, ch huang
+1
Tested on Mac OS X.
Matei
On Jul 6, 2014, at 1:54 AM, Andrew Or and...@databricks.com wrote:
+1, verified that the UI bug is in fact fixed in
https://github.com/apache/spark/pull/1255.
2014-07-05 20:01 GMT-07:00 Soren Macbeth so...@yieldbot.com:
+1
On Sat, Jul 5, 2014 at 7:41
Matei Zaharia created SPARK-2371:
Summary: Show locally-running tasks (e.g. from take()) in web UI
Key: SPARK-2371
URL: https://issues.apache.org/jira/browse/SPARK-2371
Project: Spark
Issue
Try looking at the running processes with “ps” to see their full command line
and see whether any options are different. It seems like in both cases, your
young generation is quite large (11 GB), which doesn’t make lot of sense with a
heap of 15 GB. But maybe I’m misreading something.
Matei
Spark SQL in Spark 1.1 will include all the functionality in Shark; take a look
at
http://databricks.com/blog/2014/07/01/shark-spark-sql-hive-on-spark-and-the-future-of-sql-on-spark.html.
We decided to do this because at the end of the day, the only code left in
Shark was the JDBC / Thrift
When you use hadoopConfiguration directly, I don’t think you have to replace
the “/“ with “%2f”. Have you tried it without that? Also make sure you’re not
replacing slashes in the URL itself.
Matei
On Jul 2, 2014, at 4:17 PM, Brian Gawalt bgaw...@gmail.com wrote:
Hello everyone,
I'm
I’d suggest asking the IBM Hadoop folks, but my guess is that the library
cannot be found in /opt/IHC/lib/native/Linux-amd64-64/. Or maybe if this
exception is happening in your driver program, the driver program’s
java.library.path doesn’t include this. (SPARK_LIBRARY_PATH from spark-env.sh
+1
Tested it out on Mac OS X and Windows, looked through docs.
Matei
On Jun 26, 2014, at 7:06 PM, Patrick Wendell pwend...@gmail.com wrote:
Please vote on releasing the following candidate as Apache Spark version
1.0.1!
The tag to be voted on is v1.0.1-rc1 (commit 7feeda3):
[
https://issues.apache.org/jira/browse/SPARK-1937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-1937:
-
Assignee: Rui Li
Tasks can be submitted before executors are registered
[
https://issues.apache.org/jira/browse/SPARK-1937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-1937.
--
Resolution: Fixed
Fix Version/s: 1.1.0
Target Version/s: 1.1.0
Tasks can
Matei Zaharia created SPARK-2248:
Summary: spark.default.parallelism does not apply in local mode
Key: SPARK-2248
URL: https://issues.apache.org/jira/browse/SPARK-2248
Project: Spark
Issue
[
https://issues.apache.org/jira/browse/SPARK-2124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-2124.
--
Resolution: Fixed
Fix Version/s: 1.1.0
Move aggregation into ShuffleManager
Hey Marcelo,
When we did the configuration pull request, we actually avoided having a big
list of defaults in one class file, because this creates a file that all the
components in the project depend on. For example, since we have some settings
specific to streaming and the REPL, do we want
customer targetting, accurate inventory and efficient analysis.
Thanks!
Best Regards,
Sonal
Nube Technologies
On Thu, Jun 12, 2014 at 11:33 PM, Derek Mansen de...@vistarmedia.com wrote:
Awesome, thank you!
On Wed, Jun 11, 2014 at 6:53 PM, Matei Zaharia matei.zaha
[
https://issues.apache.org/jira/browse/SPARK-2206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-2206:
-
Assignee: Manish Amde
Automatically infer the number of classification classes in multiclass
[
https://issues.apache.org/jira/browse/SPARK-2207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-2207:
-
Assignee: Manish Amde
Add minimum information gain and minimum instances per node as training
You should be able to use many of the MLlib Model objects directly in Storm, if
you save them out using Java serialization. The only one that won’t work is
probably ALS, because it’s a distributed model.
Otherwise, you will have to output them in your own format and write code for
evaluating
[
https://issues.apache.org/jira/browse/SPARK-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-1112:
-
Affects Version/s: 1.0.0
When spark.akka.frameSize 10, task results bigger than 10MiB block
Interesting, does anyone know the people over there who set it up? It would be
good if Apache itself could publish packages there, though I’m not sure what’s
involved. Since Spark just depends on Java and Python it should be easy for us
to update.
Matei
On Jun 18, 2014, at 1:37 PM, Nick
I was going to suggest the same thing :).
On Jun 18, 2014, at 4:56 PM, Evan R. Sparks evan.spa...@gmail.com wrote:
This looks like a job for SparkSQL!
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext._
case class MyRecord(country: String, name: String, age: Int,
There are a few options:
- Kryo might be able to serialize these objects out of the box, depending
what’s inside them. Try turning it on as described at
http://spark.apache.org/docs/latest/tuning.html.
- If that doesn’t work, you can create your own “wrapper” objects that
implement
It’s true that it can’t. You can try to use the CloudPickle library instead,
which is what we use within PySpark to serialize functions (see
python/pyspark/cloudpickle.py). However I’m also curious, why do you need an
RDD of functions?
Matei
On Jun 15, 2014, at 4:49 PM, madeleine
is that I'm using alternating minimization, so I'll be
minimizing over the rows and columns of this matrix at alternating steps;
hence I need to store both the matrix and its transpose to avoid data
thrashing.
On Mon, Jun 16, 2014 at 11:05 AM, Matei Zaharia [via Apache Spark User List]
[hidden
[
https://issues.apache.org/jira/browse/SPARK-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-1837.
--
Resolution: Fixed
Fix Version/s: 1.1.0
NumericRange should be partitioned in the same
The order is not guaranteed actually, only which keys end up in each partition.
Reducers may fetch data from map tasks in an arbitrary order, depending on
which ones are available first. If you’d like a specific order, you should sort
each partition. Here you might be getting it because each
You need to factor your program so that it’s not just a main(). This is not a
Spark-specific issue, it’s about how you’d unit test any program in general. In
this case, your main() creates a SparkContext, so you can’t pass one from
outside, and your code has to read data from a file and write
[
https://issues.apache.org/jira/browse/SPARK-889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14030080#comment-14030080
]
Matei Zaharia commented on SPARK-889:
-
This is a really old JIRA and actually I
(I’m forwarding this message on behalf of the ApacheCon organizers, who’d like
to see involvement from every Apache project!)
As you may be aware, ApacheCon will be held this year in Budapest, on November
17-23. (See http://apachecon.eu for more info.)
The Call For Papers for that conference
and will post it if I find it :)
Thank you, anyway
On Wed, Jun 11, 2014 at 12:19 AM, Matei Zaharia matei.zaha...@gmail.com
wrote:
It might be that conf/spark-env.sh on EC2 is configured to set it to 512, and
is overriding the application’s settings. Take a look in there and delete
Matei Zaharia created SPARK-2123:
Summary: Basic pluggable interface for shuffle
Key: SPARK-2123
URL: https://issues.apache.org/jira/browse/SPARK-2123
Project: Spark
Issue Type: Sub-task
Matei Zaharia created SPARK-2125:
Summary: Add sorting flag to ShuffleManager, and implement it in
HashShuffleManager
Key: SPARK-2125
URL: https://issues.apache.org/jira/browse/SPARK-2125
Project
Matei Zaharia created SPARK-2124:
Summary: Move aggregation into ShuffleManager implementations
Key: SPARK-2124
URL: https://issues.apache.org/jira/browse/SPARK-2124
Project: Spark
Issue
[
https://issues.apache.org/jira/browse/SPARK-2124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-2124:
-
Assignee: Saisai Shao
Move aggregation into ShuffleManager implementations
[
https://issues.apache.org/jira/browse/SPARK-2123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-2123.
--
Resolution: Fixed
Resolved in https://github.com/apache/spark/pull/1009
Basic pluggable
Matei Zaharia created SPARK-2126:
Summary: Move MapOutputTracker behind ShuffleManager interface
Key: SPARK-2126
URL: https://issues.apache.org/jira/browse/SPARK-2126
Project: Spark
Issue
Yes, actually even if you don’t set it to true, on-disk data is compressed.
(This setting only affects serialized data in memory).
Matei
On Jun 11, 2014, at 2:56 PM, Surendranauth Hiraman suren.hira...@velos.io
wrote:
Hi,
Will spark.rdd.compress=true enable compression when using
Alright, added you.
Matei
On Jun 11, 2014, at 1:28 PM, Derek Mansen de...@vistarmedia.com wrote:
Hello, I was wondering if we could add our organization to the Powered by
Spark page. The information is:
Name: Vistar Media
URL: www.vistarmedia.com
Description: Location technology company
combineByKey is designed for when your return type from the aggregation is
different from the values being aggregated (e.g. you group together objects),
and it should allow you to modify the leftmost argument of each function
(mergeCombiners, mergeValue, etc) and return that instead of
[
https://issues.apache.org/jira/browse/SPARK-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026701#comment-14026701
]
Matei Zaharia commented on SPARK-1416:
--
That pull request also added generic
It might be that conf/spark-env.sh on EC2 is configured to set it to 512, and
is overriding the application’s settings. Take a look in there and delete that
line if possible.
Matei
On Jun 10, 2014, at 2:38 PM, Aliaksei Litouka aliaksei.lito...@gmail.com
wrote:
I am testing my application in
[
https://issues.apache.org/jira/browse/SPARK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025580#comment-14025580
]
Matei Zaharia commented on SPARK-2044:
--
Hey Weihua,
I'll look into the sorting flag
[
https://issues.apache.org/jira/browse/SPARK-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-1416.
--
Resolution: Fixed
Fix Version/s: 1.1.0
Target Version/s: 1.1.0
Implemented
[
https://issues.apache.org/jira/browse/SPARK-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-1416:
-
Assignee: Nick Pentreath
Add support for SequenceFiles in PySpark
If this is a useful feature for local mode, we should open a JIRA to document
the setting or improve it (I’d prefer to add a spark.local.retries property
instead of a special URL format). We initially disabled it for everything
except unit tests because 90% of the time an exception in local
You currently can’t have multiple SparkContext objects in the same JVM, but
within a SparkContext, all of the APIs are thread-safe so you can share that
context between multiple threads. The other issue you’ll run into is that in
each thread where you want to use Spark, you need to use
. If we can
disable the UI http Server; it would be much simpler to handle than having
two http containers to deal with.
Chester
On Monday, June 9, 2014 4:35 PM, Matei Zaharia matei.zaha...@gmail.com
wrote:
You currently can’t have multiple SparkContext objects in the same JVM
[
https://issues.apache.org/jira/browse/SPARK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14021115#comment-14021115
]
Matei Zaharia commented on SPARK-2044:
--
Alright so I've posted my code at https
[
https://issues.apache.org/jira/browse/SPARK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020087#comment-14020087
]
Matei Zaharia commented on SPARK-2044:
--
{quote}
1. Is it a goal to support more kind
[
https://issues.apache.org/jira/browse/SPARK-2044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020329#comment-14020329
]
Matei Zaharia commented on SPARK-2044:
--
So BTW I think what I'll do is move over
Matei Zaharia created SPARK-2032:
Summary: Add an RDD.samplePartitions method for partition-level
sampling
Key: SPARK-2032
URL: https://issues.apache.org/jira/browse/SPARK-2032
Project: Spark
[
https://issues.apache.org/jira/browse/SPARK-2032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-2032:
-
Priority: Minor (was: Major)
Add an RDD.samplePartitions method for partition-level sampling
Matei Zaharia created SPARK-2043:
Summary: ExternalAppendOnlyMap doesn't always find matching keys
Key: SPARK-2043
URL: https://issues.apache.org/jira/browse/SPARK-2043
Project: Spark
Issue
Matei Zaharia created SPARK-2045:
Summary: Sort-based shuffle implementation
Key: SPARK-2045
URL: https://issues.apache.org/jira/browse/SPARK-2045
Project: Spark
Issue Type: New Feature
Matei Zaharia created SPARK-2047:
Summary: Use less memory in AppendOnlyMap.destructiveSortedIterator
Key: SPARK-2047
URL: https://issues.apache.org/jira/browse/SPARK-2047
Project: Spark
[
https://issues.apache.org/jira/browse/SPARK-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-2047:
-
Priority: Minor (was: Major)
Use less memory in AppendOnlyMap.destructiveSortedIterator
[
https://issues.apache.org/jira/browse/SPARK-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-2047:
-
Priority: Major (was: Minor)
Use less memory in AppendOnlyMap.destructiveSortedIterator
[
https://issues.apache.org/jira/browse/SPARK-2043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019482#comment-14019482
]
Matei Zaharia commented on SPARK-2043:
--
https://github.com/apache/spark/pull/986
701 - 800 of 2046 matches
Mail list logo