I agree with this -- basically, to build on Reynold's point, you should be able
to get almost the same performance by implementing either the Hadoop FileSystem
API or the Spark Data Source API over Ignite in the right way. This would let
people save data persistently in Ignite in addition to
Thus means that one of your cached RDD partitions is bigger than 2 GB of data.
You can fix it by having more partitions. If you read data from a file system
like HDFS or S3, set the number of partitions higher in the sc.textFile,
hadoopFile, etc methods (it's an optional second parameter to
Just FYI, it would be easiest to follow SparkR's example and add the DataFrame
API first. Other APIs will be designed to work on DataFrames (most notably
machine learning pipelines), and the surface of this API is much smaller than
of the RDD API. This API will also give you great performance
This documentation is only for writes to an external system, but all the
counting you do within your streaming app (e.g. if you use reduceByKeyAndWindow
to keep track of a running count) is exactly-once. When you write to a storage
system, no matter which streaming framework you use, you'll
[4,5,6] can be invoked before the operation for offset [1,2,3]
2) If you wanted to achieve something similar to what TridentState does,
you'll have to do it yourself (for example using Zookeeper)
Is this a correct understanding?
On Wed, Jun 17, 2015 at 7:14 PM, Matei Zaharia matei.zaha
Hey all,
Over the past 1.5 months we added a number of new committers to the project,
and I wanted to welcome them now that all of their respective forms, accounts,
etc are in. Join me in welcoming the following new committers:
- Davies Liu
- DB Tsai
- Kousuke Saruta
- Sandy Ryza
- Yin Huai
I don't like the idea of removing Hadoop 1 unless it becomes a significant
maintenance burden, which I don't think it is. You'll always be surprised how
many people use old software, even though various companies may no longer
support them.
With Hadoop 2 in particular, I may be misremembering,
[
https://issues.apache.org/jira/browse/SPARK-8110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-8110:
-
Attachment: Screen Shot 2015-06-04 at 1.51.32 PM.png
Screen Shot 2015-06-04
Matei Zaharia created SPARK-8110:
Summary: DAG visualizations sometimes look weird in Python
Key: SPARK-8110
URL: https://issues.apache.org/jira/browse/SPARK-8110
Project: Spark
Issue Type
+1
Tested on Mac OS X
On Jun 4, 2015, at 1:09 PM, Patrick Wendell pwend...@gmail.com wrote:
I will give +1 as well.
On Wed, Jun 3, 2015 at 11:59 PM, Reynold Xin r...@databricks.com wrote:
Let me give you the 1st
+1
On Tue, Jun 2, 2015 at 10:47 PM, Patrick Wendell
This happens automatically when you use the byKey operations, e.g. reduceByKey,
updateStateByKey, etc. Spark Streaming keeps the state for a given set of keys
on a specific node and sends new tuples with that key to that.
Matei
On Jun 3, 2015, at 6:31 AM, allonsy luke1...@gmail.com wrote:
:-UseCompressedOops
SPARK_DRIVER_MEMORY=129G
spark version: 1.1.1
Thank you a lot for your help!
2015-06-02 4:40 GMT+02:00 Matei Zaharia matei.zaha...@gmail.com
mailto:matei.zaha...@gmail.com:
As long as you don't use cache(), these operations will go from disk to disk,
and will only use
?
Thank you!
2015-06-02 21:25 GMT+02:00 Matei Zaharia matei.zaha...@gmail.com
mailto:matei.zaha...@gmail.com:
You shouldn't have to persist the RDD at all, just call flatMap and reduce on
it directly. If you try to persist it, that will try to load the original dat
into memory, but here
Your best bet might be to use a mapstring,string in SQL and make the keys be
longer paths (e.g. params_param1 and params_param2). I don't think you can have
a map in some of them but not in others.
Matei
On May 28, 2015, at 3:48 PM, Jeremy Lucas jeremyalu...@gmail.com wrote:
Hey Reynold,
Check out Apache's trademark guidelines here:
http://www.apache.org/foundation/marks/
http://www.apache.org/foundation/marks/
Matei
On May 20, 2015, at 12:02 AM, Justin Pihony justin.pih...@gmail.com wrote:
What is the license on using the spark logo. Is it free to be used for
displaying
Hey Tom,
Are you using the fine-grained or coarse-grained scheduler? For the
coarse-grained scheduler, there is a spark.cores.max config setting that will
limit the total # of cores it grabs. This was there in earlier versions too.
Matei
On May 19, 2015, at 12:39 PM, Thomas Dudziak
of tasks per job :)
cheers,
Tom
On Tue, May 19, 2015 at 10:05 AM, Matei Zaharia matei.zaha...@gmail.com
mailto:matei.zaha...@gmail.com wrote:
Hey Tom,
Are you using the fine-grained or coarse-grained scheduler? For the
coarse-grained scheduler, there is a spark.cores.max config
...This is madness!
On May 14, 2015, at 9:31 AM, dmoralesdf dmora...@stratio.com wrote:
Hi there,
We have released our real-time aggregation engine based on Spark Streaming.
SPARKTA is fully open source (Apache2)
You can checkout the slides showed up at the Strata past week:
(Sorry, for non-English people: that means it's a good thing.)
Matei
On May 14, 2015, at 10:53 AM, Matei Zaharia matei.zaha...@gmail.com wrote:
...This is madness!
On May 14, 2015, at 9:31 AM, dmoralesdf dmora...@stratio.com wrote:
Hi there,
We have released our real-time
It could also be that your hash function is expensive. What is the key class
you have for the reduceByKey / groupByKey?
Matei
On May 12, 2015, at 10:08 AM, Night Wolf nightwolf...@gmail.com wrote:
I'm seeing a similar thing with a slightly different stack trace. Ideas?
It could also be that your hash function is expensive. What is the key class
you have for the reduceByKey / groupByKey?
Matei
On May 12, 2015, at 10:08 AM, Night Wolf nightwolf...@gmail.com wrote:
I'm seeing a similar thing with a slightly different stack trace. Ideas?
[
https://issues.apache.org/jira/browse/SPARK-7298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-7298.
--
Resolution: Fixed
Fix Version/s: 1.4.0
Harmonize style of new UI visualizations
We should make sure to update our docs to mention s3a as well, since many
people won't look at Hadoop's docs for this.
Matei
On May 7, 2015, at 12:57 PM, Nicholas Chammas nicholas.cham...@gmail.com
wrote:
Ah, thanks for the pointers.
So as far as Spark is concerned, is this a breaking
I don't know whether this is common, but we might also allow another separator
for JSON objects, such as two blank lines.
Matei
On May 4, 2015, at 2:28 PM, Reynold Xin r...@databricks.com wrote:
Joe - I think that's a legit and useful thing to do. Do you want to give it
a shot?
On Mon,
[
https://issues.apache.org/jira/browse/SPARK-7261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14520366#comment-14520366
]
Matei Zaharia commented on SPARK-7261:
--
IMO we can do this even without SPARK-7260
You could build Spark with Scala 2.11 on Mac / Linux and transfer it over to
Windows. AFAIK it should build on Windows too, the only problem is that Maven
might take a long time to download dependencies. What errors are you seeing?
Matei
On Apr 16, 2015, at 9:23 AM, Arun Lists
Very neat, Olivier; thanks for sharing this.
Matei
On Apr 15, 2015, at 5:58 PM, Olivier Chapelle oliv...@chapelle.cc wrote:
Dear Spark users,
I would like to draw your attention to a dataset that we recently released,
which is as of now the largest machine learning dataset ever released;
+1. Tested on Mac OS X and verified that some of the bugs were fixed.
Matei
On Apr 8, 2015, at 7:13 AM, Sean Owen so...@cloudera.com wrote:
Still a +1 from me; same result (except that now of course the
UISeleniumSuite test does not fail)
On Wed, Apr 8, 2015 at 1:46 AM, Patrick Wendell
Matei Zaharia created SPARK-6778:
Summary: SQL contexts in spark-shell and pyspark should both be
called sqlContext
Key: SPARK-6778
URL: https://issues.apache.org/jira/browse/SPARK-6778
Project
You do actually sign a CLA when you become a committer, and in general, we
should ask for CLAs from anyone who contributes a large piece of code. This is
the individual CLA: https://www.apache.org/licenses/icla.txt. Some people have
sent them proactively because their employer asks them too.
[
https://issues.apache.org/jira/browse/SPARK-6646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391456#comment-14391456
]
Matei Zaharia commented on SPARK-6646:
--
Not to rain on the parade here, but I worry
Just a note, one challenge with the BYOH version might be that users who
download that can't run in local mode without also having Hadoop. But if we
describe it correctly then hopefully it's okay.
Matei
On Mar 24, 2015, at 3:05 PM, Patrick Wendell pwend...@gmail.com wrote:
Hey All,
For
Feel free to send a pull request to fix the doc (or say which versions it's
needed in).
Matei
On Mar 20, 2015, at 6:49 PM, Krishna Sankar ksanka...@gmail.com wrote:
Yep the command-option is gone. No big deal, just add the '%pylab inline'
command as part of your notebook.
Cheers
k/
The programming guide has a short example:
http://spark.apache.org/docs/latest/sql-programming-guide.html#json-datasets
http://spark.apache.org/docs/latest/sql-programming-guide.html#json-datasets.
Note that once you infer a schema for a JSON dataset, you can also use nested
path notation
[
https://issues.apache.org/jira/browse/SPARK-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359017#comment-14359017
]
Matei Zaharia commented on SPARK-1564:
--
This is still a valid issue AFAIK, isn't
+1
Tested it on Mac OS X.
One small issue I noticed is that the Scala 2.11 build is using Hadoop 1
without Hive, which is kind of weird because people will more likely want
Hadoop 2 with Hive. So it would be good to publish a build for that
configuration instead. We can do it if we do a new
Hadoop-provided
releases can help. It might kill several birds with one stone.
On Sun, Mar 8, 2015 at 11:07 PM, Matei Zaharia matei.zaha...@gmail.com
wrote:
Our goal is to let people use the latest Apache release even if vendors fall
behind or don't want to package everything, so that's
for the 2.10 build too. Pros
and cons discussed more at
https://issues.apache.org/jira/browse/SPARK-5134
https://github.com/apache/spark/pull/3917
On Sun, Mar 8, 2015 at 7:42 PM, Matei Zaharia matei.zaha...@gmail.com
wrote:
+1
Tested it on Mac OS X.
One small issue I noticed is that the Scala
Thanks! I've added you.
Matei
On Feb 17, 2015, at 4:06 PM, Ralph Bergmann | the4thFloor.eu
ra...@the4thfloor.eu wrote:
Hi,
there is a small Spark Meetup group in Berlin, Germany :-)
http://www.meetup.com/Berlin-Apache-Spark-Meetup/
Plaes add this group to the Meetups list at
of
fact
most were for it). We can still change it if somebody lays out a
strong
argument.
On Tue, Jan 27, 2015 at 12:25 PM, Matei Zaharia
matei.zaha...@gmail.com
wrote:
The type alias means your methods can specify either type and
they
will
work. It's just another name
Thanks Denny; added you.
Matei
On Feb 9, 2015, at 10:11 PM, Denny Lee denny.g@gmail.com wrote:
Forgot to add Concur to the Powered by Spark wiki:
Concur
https://www.concur.com
Spark SQL, MLLib
Using Spark for travel and expenses analytics and personalization
Thanks!
Denny
+1
Tested on Mac OS X.
Matei
On Feb 2, 2015, at 8:57 PM, Patrick Wendell pwend...@gmail.com wrote:
Please vote on releasing the following candidate as Apache Spark version
1.2.1!
The tag to be voted on is v1.2.1-rc3 (commit b6eaf77):
[
https://issues.apache.org/jira/browse/SPARK-5654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14309782#comment-14309782
]
Matei Zaharia commented on SPARK-5654:
--
Yup, there's a tradeoff, but given
You don't need HDFS or virtual machines to run Spark. You can just download it,
unzip it and run it on your laptop. See
http://spark.apache.org/docs/latest/index.html
http://spark.apache.org/docs/latest/index.html.
Matei
On Feb 6, 2015, at 2:58 PM, David Fallside falls...@us.ibm.com wrote:
[
https://issues.apache.org/jira/browse/SPARK-5608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-5608.
--
Resolution: Fixed
Fix Version/s: 1.3.0
Improve SEO of Spark documentation site to let
Hi all,
The PMC recently voted to add three new committers: Cheng Lian, Joseph Bradley
and Sean Owen. All three have been major contributors to Spark in the past
year: Cheng on Spark SQL, Joseph on MLlib, and Sean on ML and many pieces
throughout Spark Core. Join me in welcoming them as
This looks like a pretty serious problem, thanks! Glad people are testing on
Windows.
Matei
On Jan 31, 2015, at 11:57 AM, MartinWeindel martin.wein...@gmail.com wrote:
FYI: Spark 1.2.1rc2 does not work on Windows!
On creating a Spark context you get following log output on my Windows
a
package
name for it that omits sql.
I would also be in favor of adding a separate Spark Schema module for
Spark
SQL to rely on, but I imagine that might be too large a change at this
point?
-Sandy
On Mon, Jan 26, 2015 at 5:32 PM, Matei Zaharia
matei.zaha...@gmail.com
wrote:
(Actually
I believe this is needed for driver recovery in Spark Streaming. If your Spark
driver program crashes, Spark Streaming can recover the application by reading
the set of DStreams and output operations from a checkpoint file (see
(Actually when we designed Spark SQL we thought of giving it another name, like
Spark Schema, but we decided to stick with SQL since that was the most obvious
use case to many users.)
Matei
On Jan 26, 2015, at 5:31 PM, Matei Zaharia matei.zaha...@gmail.com wrote:
While it might be possible
While it might be possible to move this concept to Spark Core long-term,
supporting structured data efficiently does require quite a bit of the
infrastructure in Spark SQL, such as query planning and columnar storage. The
intent of Spark SQL though is to be more than a SQL server -- it's meant
It's hard to tell without more details, but the start-up latency in Hive can
sometimes be high, especially if you are running Hive on MapReduce. MR just
takes 20-30 seconds per job to spin up even if the job is doing nothing.
For real use of Spark SQL for short queries by the way, I'd recommend
+1 on this.
On Jan 17, 2015, at 6:16 PM, Reza Zadeh r...@databricks.com wrote:
LGTM
On Sat, Jan 17, 2015 at 5:40 PM, Patrick Wendell pwend...@gmail.com wrote:
Hey All,
Just wanted to ping about a minor issue - but one that ends up having
consequence given Spark's volume of reviews
Unfortunately we don't have anything to do with Spark on GCE, so I'd suggest
asking in the GCE support forum. You could also try to launch a Spark cluster
by hand on nodes in there. Sigmoid Analytics published a package for this here:
http://spark-packages.org/package/9
Matei
On Jan 17,
The Apache Spark project should work with it, but I'm not sure you can get
support from HDP (if you have that).
Matei
On Jan 16, 2015, at 5:36 PM, Judy Nash judyn...@exchange.microsoft.com
wrote:
Should clarify on this. I personally have used HDP 2.1 + Spark 1.2 and have
not seen a
Yeah, very cool! You may also want to check out
https://issues.apache.org/jira/browse/SPARK-5097 as something to build upon for
these operations.
Matei
On Jan 14, 2015, at 6:18 PM, Reynold Xin r...@databricks.com wrote:
Chris,
This is really cool. Congratulations and thanks for sharing
[
https://issues.apache.org/jira/browse/SPARK-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-5088:
-
Fix Version/s: (was: 1.2.1)
Use spark-class for running executors directly on mesos
[
https://issues.apache.org/jira/browse/SPARK-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-5088:
-
Target Version/s: 1.3.0 (was: 1.3.0, 1.2.1)
Use spark-class for running executors directly
Is this in the Spark shell? Case classes don't work correctly in the Spark
shell unfortunately (though they do work in the Scala shell) because we change
the way lines of code compile to allow shipping functions across the network.
The best way to get case classes in there is to compile them
[
https://issues.apache.org/jira/browse/SPARK-3619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-3619.
--
Resolution: Fixed
Fix Version/s: 1.3.0
Assignee: Jongyoul Lee (was: Timothy
FYI, ApacheCon North America call for papers is up.
Matei
Begin forwarded message:
Date: January 5, 2015 at 9:40:41 AM PST
From: Rich Bowen rbo...@rcbowen.com
Reply-To: dev d...@community.apache.org
To: dev d...@community.apache.org
Subject: ApacheCon North America 2015 Call For Papers
FYI, ApacheCon North America call for papers is up.
Matei
Begin forwarded message:
Date: January 5, 2015 at 9:40:41 AM PST
From: Rich Bowen rbo...@rcbowen.com
Reply-To: dev d...@community.apache.org
To: dev d...@community.apache.org
Subject: ApacheCon North America 2015 Call For Papers
This file needs to be on your CLASSPATH actually, not just in a directory. The
best way to pass it in is probably to package it into your application JAR. You
can put it in src/main/resources in a Maven or SBT project, and check that it
makes it into the JAR using jar tf yourfile.jar.
Matei
[
https://issues.apache.org/jira/browse/SPARK-4660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260544#comment-14260544
]
Matei Zaharia commented on SPARK-4660:
--
[~pkolaczk] mind sending a pull request
Please ask someone else to assign them for now, and just comment on them that
you're working on them. Over time if you contribute a bunch we'll add you to
that list. The problem is that in the past, people would assign issues to
themselves and never actually work on them, making it confusing
Hey Eric, sounds like you are running into several issues, but thanks for
reporting them. Just to comment on a few of these:
I'm not seeing RDDs or SRDDs cached in the Spark UI. That page remains empty
despite my calling cache().
This is expected until you compute the RDDs the first time
Yup, as he posted before, An Apache infrastructure issue prevented me from
pushing this last night. The issue was resolved today and I should be able to
push the final release artifacts tonight.
On Dec 18, 2014, at 10:14 PM, Andrew Ash and...@andrewash.com wrote:
Patrick is working on the
The problem is very likely NFS, not Spark. What kind of network is it mounted
over? You can also test the performance of your NFS by copying a file from it
to a local disk or to /dev/null and seeing how many bytes per second it can
copy.
Matei
On Dec 17, 2014, at 9:38 AM, Larryliu
is running on the same server that Spark
is running on. So basically I mount the NFS on the same bare metal machine.
Larry
On Wed, Dec 17, 2014 at 11:42 AM, Matei Zaharia matei.zaha...@gmail.com
mailto:matei.zaha...@gmail.com wrote:
The problem is very likely NFS, not Spark. What kind
It's just Bootstrap checked into SVN and built using Jekyll. You can check out
the raw source files from SVN from https://svn.apache.org/repos/asf/spark. IMO
it's fine if you guys use the layout, but just make sure it doesn't look
exactly the same because otherwise both sites will look like
Spark SQL is already available, the reason for the alpha component label is
that we are still tweaking some of the APIs so we have not yet guaranteed API
stability for it. However, that is likely to happen soon (possibly 1.3). One of
the major things added in Spark 1.2 was an external data
[
https://issues.apache.org/jira/browse/SPARK-3247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243253#comment-14243253
]
Matei Zaharia commented on SPARK-3247:
--
For those looking to learn about
You can just do mapPartitions on the whole RDD, and then called sliding() on
the iterator in each one to get a sliding window. One problem is that you will
not be able to slide forward into the next partition at partition boundaries.
If this matters to you, you need to do something more
+1
Tested on Mac OS X.
Matei
On Dec 10, 2014, at 1:08 PM, Patrick Wendell pwend...@gmail.com wrote:
Please vote on releasing the following candidate as Apache Spark version
1.2.0!
The tag to be voted on is v1.2.0-rc2 (commit a428c446e2):
[
https://issues.apache.org/jira/browse/SPARK-4690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1429#comment-1429
]
Matei Zaharia commented on SPARK-4690:
--
Yup, that's the definition
[
https://issues.apache.org/jira/browse/SPARK-4690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia closed SPARK-4690.
Resolution: Invalid
AppendOnlyMap seems not using Quadratic probing as the JavaDoc
I'd suggest asking about this on the Mesos list (CCed). As far as I know, there
was actually some ongoing work for this.
Matei
On Dec 3, 2014, at 9:46 AM, Dick Davies d...@hellooperator.net wrote:
Just wondered if anyone had managed to start spark
jobs on mesos wrapped in a docker
Matei Zaharia created SPARK-4683:
Summary: Add a beeline.cmd to run on Windows
Key: SPARK-4683
URL: https://issues.apache.org/jira/browse/SPARK-4683
Project: Spark
Issue Type: New Feature
Matei Zaharia created SPARK-4684:
Summary: Add a script to run JDBC server on Windows
Key: SPARK-4684
URL: https://issues.apache.org/jira/browse/SPARK-4684
Project: Spark
Issue Type: New
[
https://issues.apache.org/jira/browse/SPARK-4685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-4685:
-
Priority: Trivial (was: Major)
Update JavaDoc settings to include spark.ml and all spark.mllib
Matei Zaharia created SPARK-4685:
Summary: Update JavaDoc settings to include spark.ml and all
spark.mllib subpackages in the right sections
Key: SPARK-4685
URL: https://issues.apache.org/jira/browse/SPARK-4685
[
https://issues.apache.org/jira/browse/SPARK-4685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-4685:
-
Target Version/s: 1.2.1 (was: 1.2.0)
Update JavaDoc settings to include spark.ml and all
+0.9 from me. Tested it on Mac and Windows (someone has to do it) and while
things work, I noticed a few recent scripts don't have Windows equivalents,
namely https://issues.apache.org/jira/browse/SPARK-4683 and
https://issues.apache.org/jira/browse/SPARK-4684. The first one at least would
be
Hi Ryan,
As a tip (and maybe this isn't documented well), I normally use SBT for
development to avoid the slow build process, and use its interactive console to
run only specific tests. The nice advantage is that SBT can keep the Scala
compiler loaded and JITed across builds, making it faster
the timeout for waiting
for a maintainer to a week. Hopefully this will provide more options for
reviewing in these components.
The complete list is available at
https://cwiki.apache.org/confluence/display/SPARK/Committers.
Matei
On Nov 8, 2014, at 7:28 PM, Matei Zaharia matei.zaha...@gmail.com
Hey Patrick, unfortunately you got some of the text here wrong, saying 1.1.0
instead of 1.2.0. Not sure it will matter since there can well be another RC
after testing, but we should be careful.
Matei
On Nov 28, 2014, at 9:16 PM, Patrick Wendell pwend...@gmail.com wrote:
Please vote on
[
https://issues.apache.org/jira/browse/SPARK-4613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-4613.
--
Resolution: Fixed
Fix Version/s: 1.2.0
Make JdbcRDD easier to use from Java
[
https://issues.apache.org/jira/browse/SPARK-4613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-4613:
-
Issue Type: Improvement (was: Bug)
Make JdbcRDD easier to use from Java
[
https://issues.apache.org/jira/browse/SPARK-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-3628.
--
Resolution: Fixed
Fix Version/s: 1.2.0
Target Version/s: 1.1.2 (was: 0.9.3
[
https://issues.apache.org/jira/browse/SPARK-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14227077#comment-14227077
]
Matei Zaharia commented on SPARK-3628:
--
FYI I merged this into 1.2.0, since the patch
[
https://issues.apache.org/jira/browse/SPARK-732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14227108#comment-14227108
]
Matei Zaharia commented on SPARK-732:
-
As discussed on https://github.com/apache/spark
[
https://issues.apache.org/jira/browse/SPARK-3628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia reopened SPARK-3628:
--
Don't apply accumulator updates multiple times for tasks in result stages
Instead of SPARK_WORKER_INSTANCES you can also set SPARK_WORKER_CORES, to have
one worker that thinks it has more cores.
Matei
On Nov 26, 2014, at 5:01 PM, Yotto Koga yotto.k...@autodesk.com wrote:
Thanks Sean. That worked out well.
For anyone who happens onto this post and wants to do
Matei Zaharia created SPARK-4613:
Summary: Make JdbcRDD easier to use from Java
Key: SPARK-4613
URL: https://issues.apache.org/jira/browse/SPARK-4613
Project: Spark
Issue Type: Bug
[
https://issues.apache.org/jira/browse/SPARK-4613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14225615#comment-14225615
]
Matei Zaharia commented on SPARK-4613:
--
BTW the strawman for this would be a version
The main reason for the alpha tag is actually that APIs might still be
evolving, but we'd like to freeze the API as soon as possible. Hopefully it
will happen in one of 1.3 or 1.4. In Spark 1.2, we're adding an external data
source API that we'd like to get experience with before freezing it.
How are you creating the object in your Scala shell? Maybe you can write a
function that directly returns the RDD, without assigning the object to a
temporary variable.
Matei
On Nov 5, 2014, at 2:54 PM, Corey Nolet cjno...@gmail.com wrote:
The closer I look @ the stack trace in the Scala
.
On Tue, Nov 25, 2014 at 5:31 PM, Matei Zaharia matei.zaha...@gmail.com
mailto:matei.zaha...@gmail.com wrote:
How are you creating the object in your Scala shell? Maybe you can write a
function that directly returns the RDD, without assigning the object to a
temporary variable.
Matei
You can do sbt/sbt assembly/assembly to assemble only the main package.
Matei
On Nov 25, 2014, at 7:50 PM, lihu lihu...@gmail.com wrote:
Hi,
The spark assembly is time costly. If I only need the
spark-assembly-1.1.0-hadoop2.3.0.jar, do not need the
BTW as another tip, it helps to keep the SBT console open as you make source
changes (by just running sbt/sbt with no args). It's a lot faster the second
time it builds something.
Matei
On Nov 25, 2014, at 8:31 PM, Matei Zaharia matei.zaha...@gmail.com wrote:
You can do sbt/sbt assembly
201 - 300 of 2046 matches
Mail list logo