toBreeze is private within Spark, it should not be accessible to users. If you
want to make a Breeze vector from an MLlib one, it's pretty straightforward,
and you can make your own utility function for it.
Matei
On Oct 17, 2014, at 5:09 PM, Sean Owen so...@cloudera.com wrote:
Yes, I
After successful events in the past two years, the Spark Summit conference has
expanded for 2015, offering both an event in New York on March 18-19 and one in
San Francisco on June 15-17. The conference is a great chance to meet people
from throughout the Spark community and see the latest
Matei Zaharia created SPARK-3929:
Summary: Support for fixed-precision decimal
Key: SPARK-3929
URL: https://issues.apache.org/jira/browse/SPARK-3929
Project: Spark
Issue Type: New Feature
Matei Zaharia created SPARK-3930:
Summary: Add precision and scale to Spark SQL's Decimal type
Key: SPARK-3930
URL: https://issues.apache.org/jira/browse/SPARK-3930
Project: Spark
Issue Type
[
https://issues.apache.org/jira/browse/SPARK-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-3929:
-
Description: Spark SQL should support fixed-precision decimals, which are
available in Hive 0.13
Matei Zaharia created SPARK-3931:
Summary: Support reading fixed-precision decimals from Parquet
Key: SPARK-3931
URL: https://issues.apache.org/jira/browse/SPARK-3931
Project: Spark
Issue
Matei Zaharia created SPARK-3932:
Summary: Support reading fixed-precision decimals from Hive 0.13
Key: SPARK-3932
URL: https://issues.apache.org/jira/browse/SPARK-3932
Project: Spark
Issue
Matei Zaharia created SPARK-3933:
Summary: Optimize decimal type in Spark SQL for those with small
precision
Key: SPARK-3933
URL: https://issues.apache.org/jira/browse/SPARK-3933
Project: Spark
of issues. Thanks in advance!
On Oct 10, 2014 10:54 AM, Matei Zaharia matei.zaha...@gmail.com wrote:
Hi folks,
I interrupt your regularly scheduled user / dev list to bring you some pretty
cool news for the project, which is that we've been able to use Spark to
break MapReduce's 100 TB
of issues. Thanks in advance!
On Oct 10, 2014 10:54 AM, Matei Zaharia matei.zaha...@gmail.com wrote:
Hi folks,
I interrupt your regularly scheduled user / dev list to bring you some pretty
cool news for the project, which is that we've been able to use Spark to
break MapReduce's 100 TB
Hi Michael,
I've been working on this in my repo:
https://github.com/mateiz/spark/tree/decimal. I'll make some pull requests with
these features soon, but meanwhile you can try this branch. See
https://github.com/mateiz/spark/compare/decimal for the individual commits that
went into it. It
the
values as a parquet binary type. Why not write them using the int64 parquet
type instead?
Cheers,
Michael
On Oct 12, 2014, at 3:32 PM, Matei Zaharia matei.zaha...@gmail.com wrote:
Hi Michael,
I've been working on this in my repo:
https://github.com/mateiz/spark/tree/decimal. I'll make
Very cool Denny, thanks for sharing this!
Matei
On Oct 11, 2014, at 9:46 AM, Denny Lee denny.g@gmail.com wrote:
https://www.concur.com/blog/en-us/connect-tableau-to-sparksql
If you're wondering how to connect Tableau to SparkSQL - here are the steps
to connect Tableau to SparkSQL.
Hi folks,
I interrupt your regularly scheduled user / dev list to bring you some pretty
cool news for the project, which is that we've been able to use Spark to break
MapReduce's 100 TB and 1 PB sort records, sorting data 3x faster on 10x fewer
nodes. There's a detailed writeup at
Added you, thanks! (You may have to shift-refresh the page to see it updated).
Matei
On Oct 10, 2014, at 1:52 PM, Michael Oczkowski michael.oczkow...@seeq.com
wrote:
Please add the Boulder-Denver Spark meetup group to the list on the website.
Thanks for the feedback. For 1, there is an open patch:
https://github.com/apache/spark/pull/2659. For 2, broadcast blocks actually use
MEMORY_AND_DISK storage, so they will spill to disk if you have low memory, but
they're faster to access otherwise.
Matei
On Oct 9, 2014, at 12:11 PM,
Oops I forgot to add, for 2, maybe we can add a flag to use DISK_ONLY for
TorrentBroadcast, or if the broadcasts are bigger than some size.
Matei
On Oct 9, 2014, at 3:04 PM, Matei Zaharia matei.zaha...@gmail.com wrote:
Thanks for the feedback. For 1, there is an open patch:
https
A SchemaRDD is still an RDD, so you can just do rdd.map(row = row.toString).
Or if you want to get a particular field of the row, you can do rdd.map(row =
row(3).toString).
Matei
On Oct 9, 2014, at 1:22 PM, Soumya Simanta soumya.sima...@gmail.com wrote:
I've a SchemaRDD that I want to
I'm pretty sure inner joins on Spark SQL already build only one of the sides.
Take a look at ShuffledHashJoin, which calls HashJoin.joinIterators. Only outer
joins do both, and it seems like we could optimize it for those that are not
full.
Matei
On Oct 7, 2014, at 11:04 PM, Haopu Wang
I'm pretty sure inner joins on Spark SQL already build only one of the sides.
Take a look at ShuffledHashJoin, which calls HashJoin.joinIterators. Only outer
joins do both, and it seems like we could optimize it for those that are not
full.
Matei
On Oct 7, 2014, at 11:04 PM, Haopu Wang
[
https://issues.apache.org/jira/browse/SPARK-3762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-3762.
--
Resolution: Fixed
Fix Version/s: 1.2.0
clear all SparkEnv references after stop
Maybe there is a firewall issue that makes it slow for your nodes to connect
through the IP addresses they're configured with. I see there's this 10 second
pause between Updated info of block broadcast_84_piece1 and
ensureFreeSpace(4194304) called (where it actually receives the block). HTTP
The issue is that you're using SQLContext instead of HiveContext. SQLContext
implements a smaller subset of the SQL language and so you're getting a SQL
parse error because it doesn't support the syntax you have. Look at how you'd
write this in HiveQL, and then try doing that with HiveContext.
[
https://issues.apache.org/jira/browse/SPARK-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161117#comment-14161117
]
Matei Zaharia commented on SPARK-3633:
--
I'm curious, why do you think this is caused
[
https://issues.apache.org/jira/browse/SPARK-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161142#comment-14161142
]
Matei Zaharia commented on SPARK-3633:
--
In that case though, the problem might
[
https://issues.apache.org/jira/browse/SPARK-2530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-2530.
--
Resolution: Fixed
Fix Version/s: 1.1.0
This was fixed by SPARK-2711.
Relax incorrect
Hi Tom,
HDFS and Spark don't actually have a minimum block size -- so in that first
dataset, the files won't each be costing you 64 MB. However, the main reason
for difference in performance here is probably the number of RDD partitions. In
the first case, Spark will create an RDD with 1
PySpark doesn't attempt to support Jython at present. IMO while it might be a
bit faster, it would lose a lot of the benefits of Python, which are the very
strong data processing libraries (NumPy, SciPy, Pandas, etc). So I'm not sure
it's worth supporting unless someone demonstrates a really
Pretty cool, thanks for sharing this! I've added a link to it on the wiki:
https://cwiki.apache.org/confluence/display/SPARK/Supplemental+Spark+Projects.
Matei
On Oct 1, 2014, at 1:41 PM, Koert Kuipers ko...@tresata.com wrote:
well, sort of! we make input/output formats (cascading taps,
It should just work in PySpark, the same way it does in Java / Scala apps.
Matei
On Oct 1, 2014, at 4:12 PM, Sungwook Yoon sy...@maprtech.com wrote:
Yes.. you should use maprfs://
I personally haven't used pyspark, I just used scala shell or standalone with
MapR.
I think you need to
You need to set --total-executor-cores to limit how many total cores it grabs
on the cluster. --executor-cores is just for each individual executor, but it
will try to launch many of them.
Matei
On Oct 1, 2014, at 4:29 PM, Sanjay Subramanian
sanjaysubraman...@yahoo.com.INVALID wrote:
hey
[
https://issues.apache.org/jira/browse/SPARK-3356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-3356.
--
Resolution: Fixed
Fix Version/s: 1.2.0
Document when RDD elements' ordering within
[
https://issues.apache.org/jira/browse/SPARK-3356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-3356:
-
Assignee: Sean Owen
Document when RDD elements' ordering within partitions is nondeterministic
[
https://issues.apache.org/jira/browse/SPARK-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-3032.
--
Resolution: Fixed
Fix Version/s: 1.2.0
1.1.1
Potential bug when
[
https://issues.apache.org/jira/browse/SPARK-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152014#comment-14152014
]
Matei Zaharia commented on SPARK-3032:
--
Yup, this will appear in 1.1.1. I've merged
[
https://issues.apache.org/jira/browse/SPARK-3389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-3389.
--
Resolution: Fixed
Fix Version/s: 1.2.0
Add converter class to make reading Parquet
[
https://issues.apache.org/jira/browse/SPARK-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-2745:
-
Assignee: Sean Owen (was: Sean Owen)
Add Java friendly methods to Duration class
[
https://issues.apache.org/jira/browse/SPARK-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-2745:
-
Assignee: Sean Owen (was: Tathagata Das)
Add Java friendly methods to Duration class
[
https://issues.apache.org/jira/browse/SPARK-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-2745.
--
Resolution: Fixed
Fix Version/s: 1.2.0
Add Java friendly methods to Duration class
[
https://issues.apache.org/jira/browse/SPARK-3389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-3389:
-
Target Version/s: 1.2.0
Add converter class to make reading Parquet files easy with PySpark
[
https://issues.apache.org/jira/browse/SPARK-3389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-3389:
-
Assignee: Uri Laserson
Add converter class to make reading Parquet files easy with PySpark
[
https://issues.apache.org/jira/browse/SPARK-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145324#comment-14145324
]
Matei Zaharia edited comment on SPARK-3129 at 9/23/14 7:53 PM
[
https://issues.apache.org/jira/browse/SPARK-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145324#comment-14145324
]
Matei Zaharia commented on SPARK-3129:
--
Is that 100 MB/s per node or in total
[
https://issues.apache.org/jira/browse/SPARK-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14145537#comment-14145537
]
Matei Zaharia commented on SPARK-3129:
--
Alright, in that case, this sounds pretty
Is your file managed by Hive (and thus present in a Hive metastore)? In that
case, Spark SQL
(https://spark.apache.org/docs/latest/sql-programming-guide.html) is the
easiest way.
Matei
On September 23, 2014 at 2:26:10 PM, Pramod Biligiri (pramodbilig...@gmail.com)
wrote:
Hi,
I'm trying to
Matei Zaharia created SPARK-3643:
Summary: Add cluster-specific config settings to configuration page
Key: SPARK-3643
URL: https://issues.apache.org/jira/browse/SPARK-3643
Project: Spark
[
https://issues.apache.org/jira/browse/SPARK-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144244#comment-14144244
]
Matei Zaharia commented on SPARK-3032:
--
Yeah actually I'm sure TimSort works fine
[
https://issues.apache.org/jira/browse/SPARK-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14144243#comment-14144243
]
Matei Zaharia commented on SPARK-3032:
--
I'm not completely sure that this is because
[
https://issues.apache.org/jira/browse/SPARK-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-3032:
-
Assignee: Saisai Shao
Potential bug when running sort-based shuffle with sorting using TimSort
File takes a filename to write to, while Dataset takes only a JobConf. This
means that Dataset is more general (it can also save to storage systems that
are not file systems, such as key-value stores), but is more annoying to use if
you actually have a file.
Matei
On September 21, 2014 at
Matei Zaharia created SPARK-3628:
Summary: Don't apply accumulator updates multiple times for tasks
in result stages
Key: SPARK-3628
URL: https://issues.apache.org/jira/browse/SPARK-3628
Project
[
https://issues.apache.org/jira/browse/SPARK-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142756#comment-14142756
]
Matei Zaharia commented on SPARK-3628:
--
BTW the problem is that this used
[
https://issues.apache.org/jira/browse/SPARK-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142756#comment-14142756
]
Matei Zaharia edited comment on SPARK-3628 at 9/21/14 10:43 PM
[
https://issues.apache.org/jira/browse/SPARK-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14142756#comment-14142756
]
Matei Zaharia edited comment on SPARK-3628 at 9/21/14 10:49 PM
[
https://issues.apache.org/jira/browse/SPARK-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-3628:
-
Target Version/s: 1.1.1, 1.2.0, 1.0.3 (was: 1.1.1, 1.2.0, 0.9.3, 1.0.3)
Don't apply accumulator
[
https://issues.apache.org/jira/browse/SPARK-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-3628:
-
Target Version/s: 1.1.1, 1.2.0, 0.9.3, 1.0.3 (was: 1.1.1, 1.2.0, 1.0.3)
Don't apply accumulator
[
https://issues.apache.org/jira/browse/SPARK-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-3628:
-
Target Version/s: 1.1.1, 1.2.0, 0.9.3, 1.0.3 (was: 1.1.1, 1.2.0, 1.0.3)
Don't apply accumulator
[
https://issues.apache.org/jira/browse/SPARK-3629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-3629:
-
Description:
Right now this doc starts off with a big list of config options, and only
:10 AM, Matei Zaharia wrote:
Hey Sandy,
On September 20, 2014 at 8:50:54 AM, Sandy Ryza (sandy.r...@cloudera.com) wrote:
Hey All,
A couple questions came up about shared variables recently, and I wanted to
confirm my understanding and update the doc to be a little more clear.
*Broadcast
Matei Zaharia created SPARK-3611:
Summary: Show number of cores for each executor in application web
UI
Key: SPARK-3611
URL: https://issues.apache.org/jira/browse/SPARK-3611
Project: Spark
Matei Zaharia created SPARK-3619:
Summary: Upgrade to Mesos 0.21 to work around MESOS-1688
Key: SPARK-3619
URL: https://issues.apache.org/jira/browse/SPARK-3619
Project: Spark
Issue Type
Hey Sandy,
On September 20, 2014 at 8:50:54 AM, Sandy Ryza (sandy.r...@cloudera.com) wrote:
Hey All,
A couple questions came up about shared variables recently, and I wanted to
confirm my understanding and update the doc to be a little more clear.
*Broadcast variables*
Now that tasks data
[
https://issues.apache.org/jira/browse/SPARK-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14141382#comment-14141382
]
Matei Zaharia commented on SPARK-3129:
--
So Hari, what is the maximum sustainable rate
[
https://issues.apache.org/jira/browse/SPARK-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14139163#comment-14139163
]
Matei Zaharia commented on SPARK-2593:
--
Sure, it would be great to do
Hey Dave, try out RDD.toLocalIterator -- it gives you an iterator that reads
one RDD partition at a time. Scala iterators also have methods like grouped()
that let you get fixed-size groups.
Matei
On September 18, 2014 at 7:58:34 PM, dave-anderson (david.ander...@pobox.com)
wrote:
I have an
[
https://issues.apache.org/jira/browse/SPARK-3530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137719#comment-14137719
]
Matei Zaharia commented on SPARK-3530:
--
To comment on the versioning stuff here
[
https://issues.apache.org/jira/browse/SPARK-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138281#comment-14138281
]
Matei Zaharia commented on SPARK-3129:
--
Great, it will be nice to see how fast
[
https://issues.apache.org/jira/browse/SPARK-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-2620:
-
Affects Version/s: 1.1.0
case class cannot be used as key for reduce
[
https://issues.apache.org/jira/browse/SPARK-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138390#comment-14138390
]
Matei Zaharia commented on SPARK-2593:
--
The reason that we don't want to expose Akka
[
https://issues.apache.org/jira/browse/SPARK-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138402#comment-14138402
]
Matei Zaharia commented on SPARK-2593:
--
BTW doing this for the ActorReceiver
I'm pretty sure it does help, though I don't have any numbers for it. In any
case, Spark will automatically benefit from this if you link it to a version of
HDFS that contains this.
Matei
On September 17, 2014 at 5:15:47 AM, Gary Malouf (malouf.g...@gmail.com) wrote:
Cloudera had a blog post
If you want to run the computation on just one machine (using Spark's local
mode), it can probably run in a container. Otherwise you can create a
SparkContext there and connect it to a cluster outside. Note that I haven't
tried this though, so the security policies of the container might be too
.count(). As you can
see, count() does not need to serialize and ship data while the other three
methods do.
Do you recall any difference between spark 1.0 and 1.1 that might cause this
problem?
Thanks,
Du
From: Matei Zaharia matei.zaha...@gmail.com
Date: Friday, September 12, 2014 at 9:10 PM
Scala 2.11 work is under way in open pull requests though, so hopefully it will
be in soon.
Matei
On September 15, 2014 at 9:48:42 AM, Mohit Jaggi (mohitja...@gmail.com) wrote:
ah...thanks!
On Mon, Sep 15, 2014 at 9:47 AM, Mark Hamstra m...@clearstorydata.com wrote:
No, not yet. Spark SQL is
at the earliest.
On Mon, Sep 15, 2014 at 12:11 PM, Matei Zaharia matei.zaha...@gmail.com wrote:
Scala 2.11 work is under way in open pull requests though, so hopefully it will
be in soon.
Matei
On September 15, 2014 at 9:48:42 AM, Mohit Jaggi (mohitja...@gmail.com) wrote:
ah...thanks!
On Mon, Sep 15
It's true that it does not send a kill command right now -- we should probably
add that. This code was written before tasks were killable AFAIK. However, the
*job* should still finish while a speculative task is running as far as I know,
and it will just leave that task behind.
Matei
On
.count(). As you can
see, count() does not need to serialize and ship data while the other three
methods do.
Do you recall any difference between spark 1.0 and 1.1 that might cause this
problem?
Thanks,
Du
From: Matei Zaharia matei.zaha...@gmail.com
Date: Friday, September 12, 2014 at 9:10 PM
sortByKey is indeed O(n log n), it's a first pass to figure out even-sized
partitions (by sampling the RDD), then a second pass to do a distributed
merge-sort (first partition the data on each machine, then run a reduce phase
that merges the data for each partition). The point where it becomes
[
https://issues.apache.org/jira/browse/SPARK-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1412#comment-1412
]
Matei Zaharia commented on SPARK-1449:
--
Hey folks, sorry for the delay -- will look
[
https://issues.apache.org/jira/browse/SPARK-1449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-1449:
-
Assignee: Patrick Wendell
Please delete old releases from mirroring system
I've seen the file name too long error when compiling on an encrypted Linux
file system -- some of them have a limit on file name lengths. If you're on
Linux, can you try compiling inside /tmp instead?
Matei
On September 13, 2014 at 10:03:14 PM, Yin Huai (huaiyin@gmail.com) wrote:
Can you
Hi Du,
I don't think NullWritable has ever been serializable, so you must be doing
something differently from your previous program. In this case though, just use
a map() to turn your Writables to serializable types (e.g. null and String).
Matie
On September 12, 2014 at 8:48:36 PM, Du Li
Hi Du,
I don't think NullWritable has ever been serializable, so you must be doing
something differently from your previous program. In this case though, just use
a map() to turn your Writables to serializable types (e.g. null and String).
Matie
On September 12, 2014 at 8:48:36 PM, Du Li
Thanks to everyone who contributed to implementing and testing this release!
Matei
On September 11, 2014 at 11:52:43 PM, Tim Smith (secs...@gmail.com) wrote:
Thanks for all the good work. Very excited about seeing more features and
better stability in the framework.
On Thu, Sep 11, 2014 at
[
https://issues.apache.org/jira/browse/SPARK-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia reassigned SPARK-2048:
Assignee: Matei Zaharia
Optimizations to CPU usage of external spilling code
[
https://issues.apache.org/jira/browse/SPARK-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-2048.
--
Resolution: Fixed
Optimizations to CPU usage of external spilling code
[
https://issues.apache.org/jira/browse/SPARK-2048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14125247#comment-14125247
]
Matei Zaharia commented on SPARK-2048:
--
Yeah, sounds good, thanks for pointing
[
https://issues.apache.org/jira/browse/SPARK-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-2978.
--
Resolution: Fixed
Fix Version/s: 1.2.0
Provide an MR-style shuffle transformation
[
https://issues.apache.org/jira/browse/SPARK-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-2978:
-
Assignee: Sandy Ryza
Provide an MR-style shuffle transformation
[
https://issues.apache.org/jira/browse/SPARK-3444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-3444:
-
Assignee: Holden Karau (was: Holden Karau)
Provide a way to easily change the log level
[
https://issues.apache.org/jira/browse/SPARK-3444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-3444:
-
Assignee: Holden Karau
Provide a way to easily change the log level in the Spark shell while
[
https://issues.apache.org/jira/browse/SPARK-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14126175#comment-14126175
]
Matei Zaharia commented on SPARK-3441:
--
I agree that we should have more of a doc
[
https://issues.apache.org/jira/browse/SPARK-3394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-3394.
--
Resolution: Fixed
Fix Version/s: 1.0.3
1.2.0
1.1.1
[
https://issues.apache.org/jira/browse/SPARK-3394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-3394:
-
Component/s: Spark Core
TakeOrdered crashes when limit is 0
[
https://issues.apache.org/jira/browse/SPARK-3353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-3353.
--
Resolution: Fixed
Fix Version/s: 1.2.0
Stage id monotonicity (parent stage should have
[
https://issues.apache.org/jira/browse/SPARK-3211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-3211:
-
Target Version/s: 1.1.1, 1.2.0
.take() is OOM-prone when there are empty partitions
[
https://issues.apache.org/jira/browse/SPARK-3211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-3211:
-
Assignee: Andrew Ash
.take() is OOM-prone when there are empty partitions
[
https://issues.apache.org/jira/browse/SPARK-3211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-3211.
--
Resolution: Fixed
Fix Version/s: 1.2.0
1.1.1
.take() is OOM-prone
[
https://issues.apache.org/jira/browse/SPARK-640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14121883#comment-14121883
]
Matei Zaharia commented on SPARK-640:
-
[~pwendell] what is our Hadoop 1 version on AMIs
Hi Mohit,
This looks pretty interesting, but just a note on the implementation -- it
might be worthwhile to try doing this on top of Spark SQL SchemaRDDs. The
reason is that SchemaRDDs already have an efficient in-memory representation
(columnar storage), and can be read from a variety of data
401 - 500 of 2046 matches
Mail list logo