Hello,
A coworker was having a problem with a big Spark job failing after several
hours when one of the executors would segfault. That problem aside, I
speculated that her job would be more robust against these kinds of executor
crashes if she used replicated RDD storage. She's using off heap
FWIW, this is an essential feature to our use of Spark, and I'm surprised it's not advertised clearly as a limitation in the documentation. All I've found about running Spark 1.3 on 2.11 is here:http://spark.apache.org/docs/latest/building-spark.html#building-for-scala-211Also, I'm experiencing
not not-ready; it's
just not the Scala 2.11.6 REPL. Still, sure I'd favor breaking the
unofficial support to at least make the latest Scala 2.11 the unbroken
one.
On Fri, Apr 17, 2015 at 7:58 AM, Michael Allman mich...@videoamp.com wrote:
FWIW, this is an essential feature to our use of Spark
at 10:31 PM, Michael Allman mich...@videoamp.com wrote:
H... I don't follow. The 2.11.x series is supposed to be binary
compatible against user code. Anyway, I was building Spark against 2.11.2
and still saw the problems with the REPL. I've created a bug report:
https://issues.apache.org/jira
Hello,
We're running a spark sql thriftserver that several users connect to with
beeline. One limitation we've run into is that the current working database
(set with use db) is shared across all connections. So changing the
database on one connection changes the database for all connections.
Hi Pierre,
I'm setting parquet (and hdfs) block size like follows:
val ONE_GB = 1024 * 1024 * 1024
sc.hadoopConfiguration.setInt(dfs.blocksize, ONE_GB)
sc.hadoopConfiguration.setInt(parquet.block.size, ONE_GB)
Here, sc is a reference to the spark context. I've tested this and it
be that it breaks the concept of window operations which are in
Spark.
Thanks,
Jayant
On Tue, Oct 7, 2014 at 10:19 PM, Michael Allman [hidden email] wrote:
Hi Andrew,
The use case I have in mind is batch data serialization to HDFS, where sizing
files to a certain HDFS block size
Hi Andy,
This sounds awesome. Please keep us posted. Meanwhile, can you share a link to
your project? I wasn't able to find it.
Cheers,
Michael
On Oct 8, 2014, at 3:38 AM, andy petrella andy.petre...@gmail.com wrote:
Heya
You can check Zeppellin or my fork of the Scala notebook.
I'm
Ummm... what's helium? Link, plz?
On Oct 8, 2014, at 9:13 AM, Stephen Boesch java...@gmail.com wrote:
@kevin, Michael,
Second that: interested in seeing the zeppelin. pls use helium though ..
2014-10-08 7:57 GMT-07:00 Michael Allman mich...@videoamp.com:
Hi Andy,
This sounds awesome
are hoping to do some upgrades of our parquet support in the near future.
On Tue, Oct 7, 2014 at 10:33 PM, Michael Allman mich...@videoamp.com wrote:
Hello,
I was interested in testing Parquet V2 with Spark SQL, but noticed after some
investigation that the parquet writer that Spark SQL uses
Hello,
I was interested in testing Parquet V2 with Spark SQL, but noticed after some
investigation that the parquet writer that Spark SQL uses is fixed at V1 here:
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTableSupport.scala#L350.
Hi,
I also have a use for count-based windowing. I'd like to process data
batches by size as opposed to time. Is this feature on the development
roadmap? Is there a JIRA ticket for it?
Thank you,
Michael
--
View this message in context:
I just ran a runtime performance comparison between 0.9.0-incubating and your
als branch. I saw a 1.5x improvement in performance.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s-ALS-implementation-tp2567p2823.html
Sent from the
I've been thoroughly investigating this issue over the past couple of days
and have discovered quite a bit. For one thing, there is definitely (at
least) one issue/bug in the Spark implementation that leads to incorrect
results for models generated with rank 1 or a large number of iterations.
I
Hello,
I've been trying to run an iterative spark job that spills 1+ GB to disk
per iteration on a system with limited disk space. I believe there's
enough space if spark would clean up unused data from previous iterations,
but as it stands the number of iterations I can run is limited by
Hi,
I'm implementing a recommender based on the algorithm described in
http://www2.research.att.com/~yifanhu/PUB/cf.pdf. This algorithm forms the
basis for Spark's ALS implementation for data sets with implicit features.
The data set I'm working with is proprietary and I cannot share it,
16 matches
Mail list logo