[
https://issues.apache.org/jira/browse/SPARK-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14222545#comment-14222545
]
Matei Zaharia commented on SPARK-3633:
--
[~stephen] you can try the 1.1.1 RC in
http
Interesting, perhaps we could publish each one with two IDs, of which the rc
one is unofficial. The problem is indeed that you have to vote on a hash for a
potentially final artifact.
Matei
On Nov 23, 2014, at 7:54 PM, Stephen Haberman stephen.haber...@gmail.com
wrote:
Hi,
I wanted to
You can still send patches for docs until the release goes out -- please do if
you see stuff.
Matei
On Nov 20, 2014, at 6:39 AM, Madhu ma...@madhu.com wrote:
Thanks Patrick.
I've been testing some 1.2 features, looks good so far.
I have some example code that I think will be helpful for
-rc2/
http://people.apache.org/~andrewor14/spark-1.1.1-rc2/
On Thu, Nov 20, 2014 at 11:48 AM, Matei Zaharia matei.zaha...@gmail.com
mailto:matei.zaha...@gmail.com wrote:
Hector, is this a comment on 1.1.1 or on the 1.2 preview?
Matei
On Nov 20, 2014, at 11:39 AM, Hector Yee hector
Your Hadoop configuration is set to look for this file to determine racks. Is
the file present on cluster nodes? If not, look at your hdfs-site.xml and
remove the setting for a rack topology script there (or it might be in
core-site.xml).
Matei
On Nov 19, 2014, at 12:13 PM, Arun Luthra
[
https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14216691#comment-14216691
]
Matei Zaharia commented on SPARK-4452:
--
BTW I've thought about this more and here's
[
https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14217331#comment-14217331
]
Matei Zaharia commented on SPARK-4452:
--
Forced spilling is orthogonal to how you set
[
https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215425#comment-14215425
]
Matei Zaharia commented on SPARK-4452:
--
How much of this gets fixed if you fix
[
https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215557#comment-14215557
]
Matei Zaharia commented on SPARK-4452:
--
BTW we may also want to create a separate
[
https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215556#comment-14215556
]
Matei Zaharia commented on SPARK-4452:
--
Got it. It would be fine to do this if you
[
https://issues.apache.org/jira/browse/SPARK-4306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-4306:
-
Target Version/s: 1.2.0
LogisticRegressionWithLBFGS support for PySpark MLlib
[
https://issues.apache.org/jira/browse/SPARK-4306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14214134#comment-14214134
]
Matei Zaharia commented on SPARK-4306:
--
[~srinathsmn] I've assigned it to you. When
[
https://issues.apache.org/jira/browse/SPARK-4306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-4306:
-
Assignee: Varadharajan
LogisticRegressionWithLBFGS support for PySpark MLlib
Matei Zaharia created SPARK-4435:
Summary: Add setThreshold in Python LogisticRegressionModel and
SVMModel
Key: SPARK-4435
URL: https://issues.apache.org/jira/browse/SPARK-4435
Project: Spark
[
https://issues.apache.org/jira/browse/SPARK-4434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14214155#comment-14214155
]
Matei Zaharia commented on SPARK-4434:
--
[~joshrosen] make sure to revert this on 1.2
Matei Zaharia created SPARK-4439:
Summary: Export RandomForest in Python
Key: SPARK-4439
URL: https://issues.apache.org/jira/browse/SPARK-4439
Project: Spark
Issue Type: New Feature
[
https://issues.apache.org/jira/browse/SPARK-4439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-4439:
-
Summary: Expose RandomForest in Python (was: Export RandomForest in Python)
Expose RandomForest
+1
Tested on Mac OS X, and verified that sort-based shuffle bug is fixed.
Matei
On Nov 14, 2014, at 10:45 AM, Andrew Or and...@databricks.com wrote:
Hi all, since the vote ends on a Sunday, please let me know if you would
like to extend the deadline to allow more time for testing.
[
https://issues.apache.org/jira/browse/SPARK-4330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-4330.
--
Resolution: Fixed
Fix Version/s: 1.2.0
1.1.1
Target
[
https://issues.apache.org/jira/browse/SPARK-4330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-4330:
-
Assignee: Kousuke Saruta
Link to proper URL for YARN overview
Just curious, what are the pros and cons of this? Can the 0.8.1.1 client still
talk to 0.8.0 versions of Kafka, or do you need it to match your Kafka version
exactly?
Matei
On Nov 10, 2014, at 9:48 AM, Bhaskar Dutta bhas...@gmail.com wrote:
Hi,
Is there any plan to bump the Kafka
Hey Sandy,
Try using the -Dsun.io.serialization.extendedDebugInfo=true flag on the JVM to
print the contents of the objects. In addition, something else that helps is to
do the following:
{
val _arr = arr
models.map(... _arr ...)
}
Basically, copy the global variable into a local one.
Call getNumPartitions() on your RDD to make sure it has the right number of
partitions. You can also specify it when doing parallelize, e.g.
rdd = sc.parallelize(xrange(1000), 10))
This should run in parallel if you have multiple partitions and cores, but it
might be that during part of the
is just to have a better
structure for reviewing and minimize the chance of errors.
Here is a tally of the votes:
Binding votes (from PMC): 17 +1, no 0 or -1
Matei Zaharia
Michael Armbrust
Reynold Xin
Patrick Wendell
Andrew Or
Prashant Sharma
Mark Hamstra
Xiangrui Meng
Ankur Dave
Imran Rashid
Jason
It might mean that some partition was computed on two nodes, because a task for
it wasn't able to be scheduled locally on the first node. Did the RDD really
have 426 partitions total? You can click on it and see where there are copies
of each one.
Matei
On Nov 8, 2014, at 10:16 PM, Nathan
[
https://issues.apache.org/jira/browse/SPARK-4303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14203147#comment-14203147
]
Matei Zaharia commented on SPARK-4303:
--
Yup, this will actually become easier
[
https://issues.apache.org/jira/browse/SPARK-4186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-4186.
--
Resolution: Fixed
Fix Version/s: 1.2.0
Support binaryFiles and binaryRecords API
[
https://issues.apache.org/jira/browse/SPARK-644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-644.
-
Resolution: Fixed
Jobs canceled due to repeated executor failures may hang
[
https://issues.apache.org/jira/browse/SPARK-643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-643.
-
Resolution: Fixed
Standalone master crashes during actor restart
[
https://issues.apache.org/jira/browse/SPARK-677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200514#comment-14200514
]
Matei Zaharia commented on SPARK-677:
-
[~joshrosen] is this fixed now?
PySpark should
[
https://issues.apache.org/jira/browse/SPARK-681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-681.
-
Resolution: Fixed
Optimize hashtables used in Spark
[
https://issues.apache.org/jira/browse/SPARK-993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-993.
-
Resolution: Won't Fix
We investigated this for 1.0 but found that many InputFormats behave wrongly
[
https://issues.apache.org/jira/browse/SPARK-993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14200531#comment-14200531
]
Matei Zaharia commented on SPARK-993:
-
Arun, you'd see this issue if you do collect
[
https://issues.apache.org/jira/browse/SPARK-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia closed SPARK-1000.
Resolution: Cannot Reproduce
Crash when running SparkPi example with local-cluster
[
https://issues.apache.org/jira/browse/SPARK-1023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-1023.
--
Resolution: Fixed
Remove Thread.sleep(5000) from TaskSchedulerImpl
[
https://issues.apache.org/jira/browse/SPARK-1185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-1185.
--
Resolution: Fixed
In Spark Programming Guide, Master URLs should mention yarn-client
[
https://issues.apache.org/jira/browse/SPARK-2237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia closed SPARK-2237.
Resolution: Won't Fix
Add ZLIBCompressionCodec code
[
https://issues.apache.org/jira/browse/SPARK-2348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-2348:
-
Priority: Critical (was: Major)
In Windows having a enviorinment variable named 'classpath
traffic, and be very active in design API discussions.
That leads to better consistency and long-term design choices.
Cheers,
bc
On Wed, Nov 5, 2014 at 5:31 PM, Matei Zaharia matei.zaha...@gmail.com
mailto:matei.zaha...@gmail.com wrote:
Hi all,
I wanted to share a discussion we've
On Wed, Nov 05, 2014 at 05:31:58PM -0800, Matei Zaharia wrote:
Hi all,
I wanted to share a discussion we've been having on the PMC list, as well
as call for an official vote on it on a public list. Basically, as the
Spark project scales up, we need to define a model to make sure
Alright, Greg, I think I understand how Subversion's model is different, which
is that the PMC members are all full committers. However, I still think that
the model proposed here is purely organizational (how the PMC and committers
organize themselves), and in no way changes peoples' ownership
[
https://issues.apache.org/jira/browse/SPARK-4222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-4222:
-
Assignee: Jascha Swisher
FixedLengthBinaryRecordReader should readFully
[
https://issues.apache.org/jira/browse/SPARK-4222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-4222.
--
Resolution: Fixed
Fix Version/s: 1.2.0
FixedLengthBinaryRecordReader should readFully
[
https://issues.apache.org/jira/browse/SPARK-4040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-4040:
-
Assignee: jay vyas
Update spark documentation for local mode and spark-streaming
[
https://issues.apache.org/jira/browse/SPARK-4040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-4040.
--
Resolution: Fixed
Update spark documentation for local mode and spark-streaming
[
https://issues.apache.org/jira/browse/SPARK-565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-565.
-
Resolution: Won't Fix
FYI I'm going to close this because we've locked down the API for 1.X
[
https://issues.apache.org/jira/browse/SPARK-542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia closed SPARK-542.
---
Resolution: Won't Fix
New versions of Spark have ways to specify the hostname and IP address to bind
[
https://issues.apache.org/jira/browse/SPARK-600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-600.
-
Resolution: Fixed
Should no longer be a problem since 1.0
SparkContext.stop and clearJars delete
[
https://issues.apache.org/jira/browse/SPARK-619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-619.
-
Resolution: Fixed
Hadoop MapReduce should be configured to use all local disks for shuffle
[
https://issues.apache.org/jira/browse/SPARK-656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-656.
-
Resolution: Fixed
Let Amazon choose our EC2 clusters' availability zone if the user does
[
https://issues.apache.org/jira/browse/SPARK-610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-610.
-
Resolution: Fixed
Fix Version/s: 0.8.1
Assignee: Aaron Davidson
Support master
[
https://issues.apache.org/jira/browse/SPARK-785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199922#comment-14199922
]
Matei Zaharia commented on SPARK-785:
-
[~adav] it still seems to be, weirdly enough
[
https://issues.apache.org/jira/browse/SPARK-812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-812.
-
Resolution: Invalid
No longer a problem for new versions of the Netty shuffle
Netty shuffle
[
https://issues.apache.org/jira/browse/SPARK-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-880.
-
Resolution: Fixed
When built with Hadoop2, spark-shell and examples don't initialize log4j
[
https://issues.apache.org/jira/browse/SPARK-824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-824.
-
Resolution: Fixed
This is a pretty old issue that no longer affects the newest block manager
[
https://issues.apache.org/jira/browse/SPARK-914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-914.
-
Resolution: Fixed
Fix Version/s: 1.0.0
Make RDD implement Scala and Java Iterable
[
https://issues.apache.org/jira/browse/SPARK-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-1063.
--
Resolution: Fixed
Add .sortBy(f) method on RDD
this
happen.
Updated blog post:
http://databricks.com/blog/2014/11/05/spark-officially-sets-a-new-record-in-large-scale-sorting.html
On Fri, Oct 10, 2014 at 7:54 AM, Matei Zaharia matei.zaha...@gmail.com
wrote:
Hi folks,
I interrupt your regularly scheduled user / dev list to bring you
Hi all,
I wanted to share a discussion we've been having on the PMC list, as well as
call for an official vote on it on a public list. Basically, as the Spark
project scales up, we need to define a model to make sure there is still great
oversight of key components (in particular internal
need a maintainer for Mesos, and I wonder if there
is someone that can be added to that?
Tim
On Wed, Nov 5, 2014 at 5:31 PM, Matei Zaharia matei.zaha...@gmail.com wrote:
Hi all,
I wanted to share a discussion we've been having on the PMC list, as well as
call for an official vote
Yup, the Hadoop nodes were from 2013, each with 64 GB RAM, 12 cores, 10 Gbps
Ethernet and 12 disks. For 100 TB of data, the intermediate data could fit in
memory on this cluster, which can make shuffle much faster than with
intermediate data on SSDs. You can find the specs in
, 2014 at 1:31 AM, Matei Zaharia matei.zaha...@gmail.com wrote:
Hi all,
I wanted to share a discussion we've been having on the PMC list, as well as
call for an official vote on it on a public list. Basically, as the Spark
project scales up, we need to define a model to make sure there is still
for me to do that? Collect RDD in driver first and create broadcast? Or
any shortcut in spark for this?
Thanks!
-Original Message-
From: Shuai Zheng [mailto:szheng.c...@gmail.com]
Sent: Wednesday, November 05, 2014 3:32 PM
To: 'Matei Zaharia'
Cc: 'user@spark.apache.org'
Subject
this
happen.
Updated blog post:
http://databricks.com/blog/2014/11/05/spark-officially-sets-a-new-record-in-large-scale-sorting.html
On Fri, Oct 10, 2014 at 7:54 AM, Matei Zaharia matei.zaha...@gmail.com
wrote:
Hi folks,
I interrupt your regularly scheduled user / dev list to bring you
Is this about Spark SQL vs Redshift, or Spark in general? Spark in general
provides a broader set of capabilities than Redshift because it has APIs in
general-purpose languages (Java, Scala, Python) and libraries for things like
machine learning and graph processing. For example, you might use
exported from Redshift into Spark or Hadoop.
Matei
On Nov 4, 2014, at 3:51 PM, Matei Zaharia matei.zaha...@gmail.com wrote:
Is this about Spark SQL vs Redshift, or Spark in general? Spark in general
provides a broader set of capabilities than Redshift because it has APIs in
general-purpose
In Spark 1.1, the sort-based shuffle (spark.shuffle.manager=sort) will have
better performance while creating fewer files. So I'd suggest trying that too.
Matei
On Nov 3, 2014, at 6:12 PM, Andrew Or and...@databricks.com wrote:
Hey Matt,
There's some prior work that compares
(BTW this had a bug with negative hash codes in 1.1.0 so you should try
branch-1.1 for it).
Matei
On Nov 3, 2014, at 6:28 PM, Matei Zaharia matei.zaha...@gmail.com wrote:
In Spark 1.1, the sort-based shuffle (spark.shuffle.manager=sort) will have
better performance while creating fewer
You need to use broadcast followed by flatMap or mapPartitions to do map-side
joins (in your map function, you can look at the hash table you broadcast and
see what records match it). Spark SQL also does it by default for tables
smaller than the spark.sql.autoBroadcastJoinThreshold setting (by
[
https://issues.apache.org/jira/browse/SPARK-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-3466.
--
Resolution: Fixed
Fix Version/s: 1.2.0
Limit size of results that a driver collects
[
https://issues.apache.org/jira/browse/SPARK-2759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-2759.
--
Resolution: Fixed
Fix Version/s: 1.2.0
The ability to read binary files into Spark
Matei Zaharia created SPARK-4186:
Summary: Support binaryFiles and binaryRecords API in Python
Key: SPARK-4186
URL: https://issues.apache.org/jira/browse/SPARK-4186
Project: Spark
Issue Type
[
https://issues.apache.org/jira/browse/SPARK-4186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193363#comment-14193363
]
Matei Zaharia commented on SPARK-4186:
--
[~davies] it would be great if you have
[
https://issues.apache.org/jira/browse/SPARK-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-3932.
--
Resolution: Fixed
Fix Version/s: 1.2.0
Done in https://github.com/apache/spark/pull/2983
[
https://issues.apache.org/jira/browse/SPARK-3931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14193666#comment-14193666
]
Matei Zaharia commented on SPARK-3931:
--
Done in https://github.com/apache/spark/pull
[
https://issues.apache.org/jira/browse/SPARK-3931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-3931.
--
Resolution: Fixed
Fix Version/s: 1.2.0
Support reading fixed-precision decimals from
[
https://issues.apache.org/jira/browse/SPARK-3929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-3929.
--
Resolution: Fixed
Fix Version/s: 1.2.0
Support for fixed-precision decimal
Try unionAll, which is a special method on SchemaRDDs that keeps the schema on
the results.
Matei
On Nov 1, 2014, at 3:57 PM, Daniel Mahler dmah...@gmail.com wrote:
I would like to combine 2 parquet tables I have create.
I tried:
sc.union(sqx.parquetFile(fileA),
Matei. What does unionAll do if the input RDD schemas are not 100%
compatible. Does it take the union of the columns and generalize the types?
thanks
Daniel
On Sat, Nov 1, 2014 at 6:08 PM, Matei Zaharia matei.zaha...@gmail.com
mailto:matei.zaha...@gmail.com wrote:
Try unionAll, which
[
https://issues.apache.org/jira/browse/SPARK-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-3561:
-
Fix Version/s: (was: 1.2.0)
Allow for pluggable execution contexts in Spark
Matei Zaharia created SPARK-4176:
Summary: Support decimals with precision 18 in Parquet
Key: SPARK-4176
URL: https://issues.apache.org/jira/browse/SPARK-4176
Project: Spark
Issue Type: New
[
https://issues.apache.org/jira/browse/SPARK-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia resolved SPARK-1847.
--
Resolution: Fixed
Fix Version/s: 1.2.0
Pushdown filters on non-required parquet columns
[
https://issues.apache.org/jira/browse/SPARK-3968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-3968:
-
Assignee: Yash Datta
Use parquet-mr filter2 api in spark sql
[
https://issues.apache.org/jira/browse/SPARK-1847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-1847:
-
Assignee: Yash Datta
Pushdown filters on non-required parquet columns
You don't have to call it if you just exit your application, but it's useful
for example in unit tests if you want to create and shut down a separate
SparkContext for each test.
Matei
On Oct 31, 2014, at 10:39 AM, Evan R. Sparks evan.spa...@gmail.com wrote:
In cluster settings if you don't
Try using --jars instead of the driver-only options; they should work with
spark-shell too but they may be less tested.
Unfortunately, you do have to specify each JAR separately; you can maybe use a
shell script to list a directory and get a big list, or set up a project that
builds all of the
to spark-shell. Correct? If so I will file a bug
report since this is definitely not the case.
On Thu, Oct 30, 2014 at 5:39 PM, Matei Zaharia matei.zaha...@gmail.com
mailto:matei.zaha...@gmail.com wrote:
Try using --jars instead of the driver-only options; they should work with
spark-shell
Good catch! If you'd like, you can send a pull request changing the files in
docs/ to do this (see
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark),
otherwise maybe open an issue on
[
https://issues.apache.org/jira/browse/SPARK-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-3466:
-
Priority: Critical (was: Major)
Limit size of results that a driver collects for each action
Hi Stephen,
How did you generate your Maven workspace? You need to make sure the Hive
profile is enabled for it. For example sbt/sbt -Phive gen-idea.
Matei
On Oct 28, 2014, at 7:42 PM, Stephen Boesch java...@gmail.com wrote:
I have run on the command line via maven and it is fine:
mvn
A pretty large fraction of users use Java, but a few features are still not
available in it. JdbcRDD is one of them -- this functionality will likely be
superseded by Spark SQL when we add JDBC as a data source. In the meantime, to
use it, I'd recommend writing a class in Scala that has
The overridable methods of RDD are marked as @DeveloperApi, which means that
these are internal APIs used by people that might want to extend Spark, but are
not guaranteed to remain stable across Spark versions (unlike Spark's public
APIs).
BTW, if you want a way to do this that does not
[
https://issues.apache.org/jira/browse/SPARK-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14178025#comment-14178025
]
Matei Zaharia commented on SPARK-3466:
--
Ah, I see, that concern makes sense
It seems that ++ does the right thing on arrays of longs, and gives you another
one:
scala val a = Array[Long](1,2,3)
a: Array[Long] = Array(1, 2, 3)
scala val b = Array[Long](1,2,3)
b: Array[Long] = Array(1, 2, 3)
scala a ++ b
res0: Array[Long] = Array(1, 2, 3, 1, 2, 3)
scala res0.getClass
[
https://issues.apache.org/jira/browse/SPARK-3467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matei Zaharia updated SPARK-3467:
-
Assignee: Davies Liu
Python BatchedSerializer should dynamically lower batch size for large
[
https://issues.apache.org/jira/browse/SPARK-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14177824#comment-14177824
]
Matei Zaharia commented on SPARK-3655:
--
I believe you can build this on top
BTW several people asked about registration and student passes. Registration
will open in a few weeks, and like in previous Spark Summits, I expect there to
be a special pass for students.
Matei
On Oct 18, 2014, at 9:52 PM, Matei Zaharia matei.zaha...@gmail.com wrote:
After successful
BTW several people asked about registration and student passes. Registration
will open in a few weeks, and like in previous Spark Summits, I expect there to
be a special pass for students.
Matei
On Oct 18, 2014, at 9:52 PM, Matei Zaharia matei.zaha...@gmail.com wrote:
After successful
I'd also wait a bit until these are gone. Jetty is unfortunately a much hairier
topic by the way, because the Hadoop libraries also depend on Jetty. I think it
will be hard to update. However, a patch that shades Jetty might be nice to
have, if that doesn't require shading a lot of other stuff.
After successful events in the past two years, the Spark Summit conference has
expanded for 2015, offering both an event in New York on March 18-19 and one in
San Francisco on June 15-17. The conference is a great chance to meet people
from throughout the Spark community and see the latest
301 - 400 of 2046 matches
Mail list logo