You can modify project/SparkBuild.scala and build Spark with sbt instead of
Maven.
On Jun 5, 2014, at 12:36 PM, Meisam Fathi meisam.fa...@gmail.com wrote:
Hi community,
How should I change sbt to compile spark core with a different version
of Scala? I see maven pom files define
in java and port
it into Spark?
Best regards,
Wei
-
Wei Tan, PhD
Research Staff Member
IBM T. J. Watson Research Center
http://researcher.ibm.com/person/us-wtan
From:Matei Zaharia matei.zaha...@gmail.com
To:user
, June 5, 2014 1:35 AM, Matei Zaharia matei.zaha...@gmail.com
wrote:
If this isn’t the problem, it would be great if you can post the code for the
program.
Matei
On Jun 4, 2014, at 12:58 PM, Xu (Simon) Chen xche...@gmail.com wrote:
Maybe your two workers have different assembly jar
Matei Zaharia created SPARK-2013:
Summary: Add Python pickleFile to programming guide
Key: SPARK-2013
URL: https://issues.apache.org/jira/browse/SPARK-2013
Project: Spark
Issue Type
Matei Zaharia created SPARK-2014:
Summary: Make PySpark store RDDs in MEMORY_ONLY_SER with
compression by default
Key: SPARK-2014
URL: https://issues.apache.org/jira/browse/SPARK-2014
Project: Spark
[
https://issues.apache.org/jira/browse/SPARK-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-2013:
-
Assignee: Kan Zhang
Add Python pickleFile to programming guide
[
https://issues.apache.org/jira/browse/SPARK-1912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-1912:
-
Target Version/s: 0.9.2, 1.0.1, 1.1.0 (was: 0.9.2, 1.0.1)
Compression memory issue during
[
https://issues.apache.org/jira/browse/SPARK-1912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-1912:
-
Target Version/s: 0.9.2, 1.0.1
Compression memory issue during reduce
Matei Zaharia created SPARK-2024:
Summary: Add saveAsSequenceFile to PySpark
Key: SPARK-2024
URL: https://issues.apache.org/jira/browse/SPARK-2024
Project: Spark
Issue Type: New Feature
If this isn’t the problem, it would be great if you can post the code for the
program.
Matei
On Jun 4, 2014, at 12:58 PM, Xu (Simon) Chen xche...@gmail.com wrote:
Maybe your two workers have different assembly jar files?
I just ran into a similar problem that my spark-shell is using a
Yes, you can write some glue in Spark to call these. Some functions to look at:
- SparkContext.hadoopRDD lets you create an input RDD from an existing JobConf
configured by Hadoop (including InputFormat, paths, etc)
- RDD.mapPartitions lets you operate in all the values on one partition (block)
than just one line? (Of course you would have to click to expand it.)
On Wed, Jun 4, 2014 at 2:38 AM, John Salvatier jsalvat...@gmail.com wrote:
Ok, I will probably open a Jira.
On Tue, Jun 3, 2014 at 5:29 PM, Matei Zaharia matei.zaha...@gmail.com wrote:
You can use RDD.setName to give
In PySpark, the data processed by each reduce task needs to fit in memory
within the Python process, so you should use more tasks to process this
dataset. Data is spilled to disk across tasks.
I’ve created https://issues.apache.org/jira/browse/SPARK-2021 to track this —
it’s something we’ve
All of these are disposed of automatically if you stop the context or exit the
program.
Matei
On Jun 4, 2014, at 2:22 PM, Daniel Siegmann daniel.siegm...@velos.io wrote:
Will the broadcast variables be disposed automatically if the context is
stopped, or do I still need to unpersist()?
On Wed, Jun 4, 2014 at 1:42 PM, Matei Zaharia matei.zaha...@gmail.com wrote:
In PySpark, the data processed by each reduce task needs to fit in memory
within the Python process, so you should use more tasks to process this
dataset. Data is spilled to disk across tasks.
I’ve created https
to include Python APIs in Spark Streaming?
Anytime frame on this?
Thanks!
John
On Thu, May 29, 2014 at 4:19 PM, Matei Zaharia matei.zaha...@gmail.com
wrote:
Quite a few people ask this question and the answer is pretty simple. When we
started Spark, we had two goals — we wanted to work
Are you using the logistic_regression.py in examples/src/main/python or
examples/src/main/python/mllib? The first one is an example of writing logistic
regression by hand and won’t be as efficient as the MLlib one. I suggest trying
the MLlib one.
You may also want to check how many iterations
.
The MLLib version of logistic regression doesn't seem to use all the cores on
my machine.
Regards,
Krishna
On Wed, Jun 4, 2014 at 6:47 PM, Matei Zaharia matei.zaha...@gmail.com wrote:
Are you using the logistic_regression.py in examples/src/main/python or
examples/src/main
[
https://issues.apache.org/jira/browse/SPARK-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14017001#comment-14017001
]
Matei Zaharia commented on SPARK-1790:
--
It's fine to skip the check right now; I
[
https://issues.apache.org/jira/browse/SPARK-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-1942:
-
Fix Version/s: 1.1.0
Stop clearing spark.driver.port in unit tests
[
https://issues.apache.org/jira/browse/SPARK-1912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-1912.
--
Resolution: Fixed
Compression memory issue during reduce
[
https://issues.apache.org/jira/browse/SPARK-1992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-1992:
-
Assignee: Christian Tzolov
Support for Pivotal HD in the Maven build
[
https://issues.apache.org/jira/browse/SPARK-1992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-1992:
-
Fix Version/s: 1.0.1
Support for Pivotal HD in the Maven build
[
https://issues.apache.org/jira/browse/SPARK-1992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-1992:
-
Issue Type: Improvement (was: Bug)
Support for Pivotal HD in the Maven build
[
https://issues.apache.org/jira/browse/SPARK-1992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-1992:
-
Fix Version/s: 1.1.0
Support for Pivotal HD in the Maven build
[
https://issues.apache.org/jira/browse/SPARK-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-1468.
--
Resolution: Fixed
The hash method used by partitionBy in Pyspark doesn't deal with None
[
https://issues.apache.org/jira/browse/SPARK-1161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-1161.
--
Resolution: Fixed
Merged this in -- thanks Kan!
Add saveAsObjectFile
[
https://issues.apache.org/jira/browse/SPARK-1161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-1161:
-
Fix Version/s: 1.1.0
Add saveAsObjectFile and SparkContext.objectFile in Python
Done. Looks like this was lost in the JIRA import.
Matei
On Jun 3, 2014, at 11:33 AM, Henry Saputra henry.sapu...@gmail.com wrote:
Hi,
Could someone with right karma kindly add my username (hsaputra) to
Spark's contributor list?
I was added before but somehow now I can no longer assign
Yup, it’s meant to be just a Map. You should probably use collect() and build a
multimap instead if you’d like that.
Matei
On Jun 3, 2014, at 2:08 PM, Doris Xin doris.s@gmail.com wrote:
Hey guys,
Just wanted to check real quick if collectAsMap was by design not to
return a multimap
Yeah unfortunately Hadoop 2 requires these binaries on Windows. Hadoop 1 runs
just fine without them.
Matei
On Jun 3, 2014, at 10:33 AM, Sean Owen so...@cloudera.com wrote:
I'd try the internet / SO first -- these are actually generic
Hadoop-related issues. Here I think you don't have
You can use RDD.setName to give it a name. There’s also a creationSite field
that is private[spark] — we may want to add a public setter for that later. If
the name isn’t enough and you’d like this, please open a JIRA issue for it.
Matei
On Jun 3, 2014, at 5:22 PM, John Salvatier
What Java version do you have, and how did you get Spark (did you build it
yourself by any chance or download a pre-built one)? If you build Spark
yourself you need to do it with Java 6 — it’s a known issue because of the way
Java 6 and 7 package JAR files. But I haven’t seen it result in this
Ghost, it's the dream language
we've theorized about for years! I hadn't realized!
Indeed, glad you’re enjoying it.
Matei
On Mon, Jun 2, 2014 at 12:05 PM, Matei Zaharia matei.zaha...@gmail.com
wrote:
FYI, I opened https://issues.apache.org/jira/browse/SPARK-1990 to track this.
Matei
You can copy your configuration from the old one. I’d suggest just downloading
it to a different location on each node first for testing, then you can delete
the old one if things work.
On Jun 3, 2014, at 12:38 AM, MEETHU MATHEW meethu2...@yahoo.co.in wrote:
Hi ,
I am currently using
Matei Zaharia created SPARK-1996:
Summary: Remove use of special Maven repo for Akka
Key: SPARK-1996
URL: https://issues.apache.org/jira/browse/SPARK-1996
Project: Spark
Issue Type
Madhu, can you send me your Wiki username? (Sending it just to me is fine.) I
can add you to the list to edit it.
Matei
On Jun 2, 2014, at 6:27 PM, Reynold Xin r...@databricks.com wrote:
I tried but didn't find where I could add you. You probably need Matei to
help out with this.
On
You can just use the Maven build for now, even for Spark 1.0.0.
Matei
On Jun 2, 2014, at 5:30 PM, Mohit Nayak wiza...@gmail.com wrote:
Hey,
Yup that fixed it. Thanks so much!
Is this the only solution, or could this be resolved in future versions of
Spark ?
On Mon, Jun 2, 2014 at
Matei Zaharia created SPARK-1989:
Summary: Exit executors faster if they get into a cycle of heavy GC
Key: SPARK-1989
URL: https://issues.apache.org/jira/browse/SPARK-1989
Project: Spark
Matei Zaharia created SPARK-1990:
Summary: spark-ec2 should only need Python 2.6, not 2.7
Key: SPARK-1990
URL: https://issues.apache.org/jira/browse/SPARK-1990
Project: Spark
Issue Type
[
https://issues.apache.org/jira/browse/SPARK-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-1790:
-
Fix Version/s: 1.0.1
Update EC2 scripts to support r3 instance types
[
https://issues.apache.org/jira/browse/SPARK-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14015146#comment-14015146
]
Matei Zaharia commented on SPARK-1990:
--
BTW here is the first error this gets:
{code
Why do you need to call Serializer from your own program? It’s an internal
developer API so ideally it would only be called to extend Spark. Are you
looking to implement a custom Serializer?
Matei
On Jun 1, 2014, at 3:40 PM, Soren Macbeth so...@yieldbot.com wrote:
BTW passing a ClassTag tells the Serializer what the type of object being
serialized is when you compile your program, which will allow for more
efficient serializers (especially on streams).
Matei
On Jun 1, 2014, at 4:24 PM, Matei Zaharia matei.zaha...@gmail.com wrote:
Why do you need
it by making ClassTag
object in clojure, but it's less than ideal.
On Sun, Jun 1, 2014 at 4:25 PM, Matei Zaharia matei.zaha...@gmail.com
wrote:
BTW passing a ClassTag tells the Serializer what the type of object being
serialized is when you compile your program, which will allow for more
, 2014 at 5:10 PM, Matei Zaharia matei.zaha...@gmail.com
wrote:
Ah, got it. In general it will always be safe to pass the ClassTag for
java.lang.Object here — this is what our Java API does to say that type
info is not known. So you can always pass that. Look at the Java code for
how to get
More specifically with the -a flag, you *can* set your own AMI, but you’ll need
to base it off ours. This is because spark-ec2 assumes that some packages (e.g.
java, Python 2.6) are already available on the AMI.
Matei
On Jun 1, 2014, at 11:01 AM, Patrick Wendell pwend...@gmail.com wrote:
Hey
1, 2014, at 3:11 PM, PJ$ p...@chickenandwaffl.es wrote:
Running on a few m3.larges with the ami-848a6eec image (debian 7). Haven't
gotten any further. No clue what's wrong. I'd really appreciate any guidance
y'all could offer.
Best,
PJ$
On Sat, May 31, 2014 at 1:40 PM, Matei Zaharia
FYI, I opened https://issues.apache.org/jira/browse/SPARK-1990 to track this.
Matei
On Jun 1, 2014, at 6:14 PM, Jeremy Lee unorthodox.engine...@gmail.com wrote:
Sort of.. there were two separate issues, but both related to AWS..
I've sorted the confusion about the Master/Worker AMI ... use
[
https://issues.apache.org/jira/browse/SPARK-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-1917.
--
Resolution: Fixed
PySpark fails to import functions from {{scipy.special
[
https://issues.apache.org/jira/browse/SPARK-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-1917:
-
Assignee: Uri Laserson
PySpark fails to import functions from {{scipy.special
What instance types did you launch on?
Sometimes you also get a bad individual machine from EC2. It might help to
remove the node it’s complaining about from the conf/slaves file.
Matei
On May 30, 2014, at 11:18 AM, PJ$ p...@chickenandwaffl.es wrote:
Hey Folks,
I'm really having quite a
[
https://issues.apache.org/jira/browse/MESOS-53?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated MESOS-53:
---
Assignee: (was: Matei Zaharia)
Master should make offers even for machines with no free memory
[
https://issues.apache.org/jira/browse/SPARK-1784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia closed SPARK-1784.
Resolution: Invalid
Fix Version/s: (was: 1.0.0)
Add a partitioner which partitions
[
https://issues.apache.org/jira/browse/SPARK-1811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-1811:
-
Assignee: Koert Kuipers
Support resizable output buffer for kryo serializer
[
https://issues.apache.org/jira/browse/SPARK-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14012520#comment-14012520
]
Matei Zaharia commented on SPARK-1518:
--
Sorry, I'm still not sure I understand what
[
https://issues.apache.org/jira/browse/SPARK-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14012651#comment-14012651
]
Matei Zaharia commented on SPARK-1518:
--
Okay, got it. But this only applies to you
[
https://issues.apache.org/jira/browse/SPARK-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14012655#comment-14012655
]
Matei Zaharia commented on SPARK-1518:
--
BTW one other thing is that in 1.0, you can
This is a pretty cool idea — instead of cache depth I’d call it something like
reference counting. Would you mind opening a JIRA issue about it?
The issue of really composing together libraries that use RDDs nicely isn’t
fully explored, but this is certainly one thing that would help with it.
are the totals:
+1: (13 votes)
Matei Zaharia*
Mark Hamstra*
Holden Karau
Nick Pentreath*
Will Benton
Henry Saputra
Sean McNamara*
Xiangrui Meng*
Andy Konwinski*
Krishna Sankar
Kevin Markey
Patrick Wendell*
Tathagata Das*
0: (1 vote)
Ankur Dave*
-1: (0 vote)
Please hold off
Hi Anand,
This is probably already handled by the RDD.pipe() operation. It will spawn a
process and let you feed data to it through its stdin and read data through
stdout.
Matei
On May 29, 2014, at 9:39 AM, ansriniv ansri...@gmail.com wrote:
I have a requirement where for every Spark
That hash map is just a list of where each task ran, it’s not the actual data.
How many map and reduce tasks do you have? Maybe you need to give the driver a
bit more memory, or use fewer tasks (e.g. do reduceByKey(_ + _, 100) to use
only 100 tasks).
Matei
On May 29, 2014, at 2:03 AM, haitao
Quite a few people ask this question and the answer is pretty simple. When we
started Spark, we had two goals — we wanted to work with the Hadoop ecosystem,
which is JVM-based, and we wanted a concise programming interface similar to
Microsoft’s DryadLINQ (the first language-integrated big data
It can be set in an individual application.
Consolidation had some issues on ext3 as mentioned there, though we might
enable it by default in the future because other optimizations now made it
perform on par with the non-consolidation version. It also had some bugs in
0.9.0 so I’d suggest at
Matei Zaharia created SPARK-1945:
Summary: Add full Java examples in MLlib docs
Key: SPARK-1945
URL: https://issues.apache.org/jira/browse/SPARK-1945
Project: Spark
Issue Type: Sub-task
[
https://issues.apache.org/jira/browse/SPARK-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-1936.
--
Resolution: Won't Fix
We should not change these files' license headers because they're files
[
https://issues.apache.org/jira/browse/SPARK-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011548#comment-14011548
]
Matei Zaharia commented on SPARK-1790:
--
Thanks Sujeet! Just post here when you have
[
https://issues.apache.org/jira/browse/SPARK-1790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-1790:
-
Labels: Starter (was: starter)
Update EC2 scripts to support r3 instance types
[
https://issues.apache.org/jira/browse/SPARK-1952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011711#comment-14011711
]
Matei Zaharia commented on SPARK-1952:
--
Ryan, do you know what SLF4J version Pig
[
https://issues.apache.org/jira/browse/SPARK-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-1712.
--
Resolution: Fixed
ParallelCollectionRDD operations hanging forever without any error messages
[
https://issues.apache.org/jira/browse/SPARK-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-1712:
-
Priority: Major (was: Blocker)
ParallelCollectionRDD operations hanging forever without any
[
https://issues.apache.org/jira/browse/SPARK-1817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-1817:
-
Priority: Minor (was: Blocker)
RDD zip erroneous when partitions do not divide RDD count
[
https://issues.apache.org/jira/browse/SPARK-1817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-1817:
-
Priority: Major (was: Minor)
RDD zip erroneous when partitions do not divide RDD count
[
https://issues.apache.org/jira/browse/SPARK-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-1712:
-
Fix Version/s: 1.0.1
ParallelCollectionRDD operations hanging forever without any error
[
https://issues.apache.org/jira/browse/SPARK-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011830#comment-14011830
]
Matei Zaharia commented on SPARK-1712:
--
Merged the frame size check into 0.9.2
[
https://issues.apache.org/jira/browse/SPARK-1518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011942#comment-14011942
]
Matei Zaharia commented on SPARK-1518:
--
Sean, the model for linking to Hadoop has
[
https://issues.apache.org/jira/browse/SPARK-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-1112:
-
Priority: Critical (was: Blocker)
When spark.akka.frameSize 10, task results bigger than
[
https://issues.apache.org/jira/browse/SPARK-1112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011978#comment-14011978
]
Matei Zaharia commented on SPARK-1112:
--
I'm curious, why did you want to make
It sounds like you made a typo in the code — perhaps you’re trying to call
self._jvm.PythonRDDnewAPIHadoopFile instead of
self._jvm.PythonRDD.newAPIHadoopFile? There should be a dot before the new.
Matei
On May 28, 2014, at 5:25 PM, twizansk twiza...@gmail.com wrote:
Hi Nick,
I finally
You can remove cached RDDs by calling unpersist() on them.
You can also use SparkContext.getRDDStorageInfo to get info on cache usage,
though this is a developer API so it may change in future versions. We will add
a standard API eventually but this is just very closely tied to framework
[
https://issues.apache.org/jira/browse/SPARK-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010496#comment-14010496
]
Matei Zaharia commented on SPARK-1566:
--
https://github.com/apache/spark/pull/896
[
https://issues.apache.org/jira/browse/SPARK-1825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-1825:
-
Fix Version/s: (was: 1.0.0)
Windows Spark fails to work with Linux YARN
Matei Zaharia created SPARK-1942:
Summary: Stop clearing spark.driver.port in unit tests
Key: SPARK-1942
URL: https://issues.apache.org/jira/browse/SPARK-1942
Project: Spark
Issue Type: Task
Hei Taeyun, have you sent a pull request for this fix? We can review it there.
It’s too late to merge anything but blockers for 1.0.0 but we can do it for
1.0.1 or 1.1, depending how big the patch is.
Matei
On May 27, 2014, at 5:25 PM, innowireless TaeYun Kim
taeyun@innowireless.co.kr
+1
Tested on Mac OS X and Windows.
Matei
On May 26, 2014, at 7:38 AM, Tathagata Das tathagata.das1...@gmail.com wrote:
Please vote on releasing the following candidate as Apache Spark version
1.0.0!
This has a few important bug fixes on top of rc10:
SPARK-1900 and SPARK-1918:
I think the question for me would be does this only happen when you call
partitionBy, or always? And how common do you expect calls to partitionBy to
be? If we can wait for 1.0.1 then I’d wait on this one.
Matei
On May 26, 2014, at 10:47 PM, Patrick Wendell pwend...@gmail.com wrote:
Hey
[
https://issues.apache.org/jira/browse/SPARK-1566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia reassigned SPARK-1566:
Assignee: Matei Zaharia
Consolidate the Spark Programming Guide with tabs for all
+1
Tested it on both Windows and Mac OS X, with both Scala and Python. Confirmed
that the issues in the previous RC were fixed.
Matei
On May 20, 2014, at 5:28 PM, Marcelo Vanzin van...@cloudera.com wrote:
+1 (non-binding)
I have:
- checked signatures and checksums of the files
- built
restarting the workers usually
resolves this, but we often seen workers disappear after a failed or killed
job.
If we see this occur again, I'll try and provide some logs.
On Mon, May 19, 2014 at 10:51 PM, Matei Zaharia matei.zaha...@gmail.com
wrote:
Which version is this with? I
Unfortunately this is not yet possible. There’s a patch in progress posted here
though: https://github.com/apache/spark/pull/455 — it would be great to get
your feedback on it.
Matei
On May 20, 2014, at 4:21 PM, twizansk twiza...@gmail.com wrote:
Hello,
This seems like a basic question
[
https://issues.apache.org/jira/browse/SPARK-1875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14001420#comment-14001420
]
Matei Zaharia commented on SPARK-1875:
--
I see, it might be fine to just remove
Matei Zaharia created SPARK-1879:
Summary: Default PermGen size too small when using Hadoop2 and Hive
Key: SPARK-1879
URL: https://issues.apache.org/jira/browse/SPARK-1879
Project: Spark
[
https://issues.apache.org/jira/browse/SPARK-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia reassigned SPARK-1879:
Assignee: Matei Zaharia
Default PermGen size too small when using Hadoop2 and Hive
[
https://issues.apache.org/jira/browse/SPARK-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14001451#comment-14001451
]
Matei Zaharia edited comment on SPARK-1879 at 5/19/14 7:25 AM
[
https://issues.apache.org/jira/browse/SPARK-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14001451#comment-14001451
]
Matei Zaharia commented on SPARK-1879:
--
BTW the warning on Java 8 is the following
[
https://issues.apache.org/jira/browse/SPARK-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14001461#comment-14001461
]
Matei Zaharia commented on SPARK-1879:
--
https://github.com/apache/spark/pull/823
[
https://issues.apache.org/jira/browse/SPARK-1857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14002275#comment-14002275
]
Matei Zaharia commented on SPARK-1857:
--
The problem is that it's not currently
[
https://issues.apache.org/jira/browse/SPARK-1874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14002369#comment-14002369
]
Matei Zaharia commented on SPARK-1874:
--
Yes, cause there's other stuff in `data`. I
“master” is where development happens, while branch-1.0, branch-0.9, etc are
for maintenance releases in those versions. Most likely if you want to
contribute you should use master. Some of the other named branches were for big
features in the past, but none are actively used now.
Matei
On
, May 19, 2014 at 1:31 PM, Matei Zaharia matei.zaha...@gmail.com
wrote:
What version is this with? We used to build each partition first before
writing it out, but this was fixed a while back (0.9.1, but it may also be in
0.9.0).
Matei
On May 19, 2014, at 12:41 AM, Sai Prasanna
801 - 900 of 2046 matches
Mail list logo