Thanks for the pointer Peter, that change will indeed fix this bug and
it looks like it will make it into the upcoming 1.3.0 release.
@Evan, for reference, completeness and posterity:
Just to be clear - you're currently calling .persist() before you pass data
to LogisticRegressionWithLBFGS?
Running ec2 launch scripts gives me the following error:
ssl.SSLError: [Errno 1] _ssl.c:504: error:14090086:SSL
routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed
Full stack trace at
https://gist.github.com/insidedctm/4d41600bc22560540a26
I’m running OSX Mavericks 10.9.5
I’ll
I meant using |saveAsParquetFile|. As for partition number, you can
always control it with |spark.sql.shuffle.partitions| property.
Cheng
On 2/23/15 1:38 PM, nitin wrote:
I believe calling processedSchemaRdd.persist(DISK) and
processedSchemaRdd.checkpoint() only persists data and I will lose
This vote was supposed to close on Saturday but it looks like no PMCs voted
(other than the implicit vote from Patrick). Was there a discussion offline
to cut an RC2? Was the vote extended?
On Mon, Feb 23, 2015 at 6:59 AM, Robin East robin.e...@xense.co.uk wrote:
Running ec2 launch scripts
Yes, recently we improved ParquetRelation2 quite a bit. Spark SQL uses its
own Parquet support to read partitioned Parquet tables declared in Hive
metastore. Only writing to partitioned tables is not covered yet. These
improvements will be included in Spark 1.3.0.
Just created SPARK-5948 to
Yes, recently we improved ParquetRelation2 quite a bit. Spark SQL uses
its own Parquet support to read partitioned Parquet tables declared in
Hive metastore. Only writing to partitioned tables is not covered yet.
These improvements will be included in Spark 1.3.0.
Just created SPARK-5948 to
Thanks Sean. I glossed over the comment about SPARK-5669.
On Mon, Feb 23, 2015 at 9:05 AM, Sean Owen so...@cloudera.com wrote:
Yes my understanding from Patrick's comment is that this RC will not
be released, but, to keep testing. There's an implicit -1 out of the
gates there, I believe, and
Hi Mark,
For input streams like text input stream, only RDDs can be recovered from
checkpoint, no missed files, if file is missed, actually an exception will be
raised. If you use HDFS, HDFS will guarantee no data loss since it has 3
copies.Otherwise user logic has to guarantee no file deleted
The first concern for Spark will probably be to ensure that we still build
and test against Python 2.6, since that's the minimum version of Python we
support.
Otherwise this seems OK. We use numpy and other Python packages in PySpark,
but I don't think we're pinned to any particular version of
So actually, the list of blockers on JIRA is a bit outdated. These
days I won't cut RC1 unless there are no known issues that I'm aware
of that would actually block the release (that's what the snapshot
ones are for). I'm going to clean those up and push others to do so
also.
The main issues I'm
Hello,
I was interested in creating a StreamingContext textFileStream based job,
which runs for long durations, and can also recover from prolonged driver
failure... It seems like StreamingContext checkpointing is mainly used for
the case when the driver dies during the processing of an RDD, and
It's only been reported on this thread by Tom, so far.
On Mon, Feb 23, 2015 at 10:29 AM, Marcelo Vanzin van...@cloudera.com wrote:
Hey Patrick,
Do you have a link to the bug related to Python and Yarn? I looked at
the blockers in Jira but couldn't find it.
On Mon, Feb 23, 2015 at 10:18 AM,
Hey Patrick,
Do you have a link to the bug related to Python and Yarn? I looked at
the blockers in Jira but couldn't find it.
On Mon, Feb 23, 2015 at 10:18 AM, Patrick Wendell pwend...@gmail.com wrote:
So actually, the list of blockers on JIRA is a bit outdated. These
days I won't cut RC1
Hi Tom, are you using an sbt-built assembly by any chance? If so, take
a look at SPARK-5808.
I haven't had any problems with the maven-built assembly. Setting
SPARK_HOME on the executors is a workaround if you want to use the sbt
assembly.
On Fri, Feb 20, 2015 at 2:56 PM, Tom Graves
good morning, developers!
TL;DR:
i will be installing anaconda and setting it in the system PATH so that
your python will default to 2.7, as well as it taking over management of
all of the sci-py packages. this is potentially a big change, so i'll be
testing locally on my staging instance
Hi Jerry,
Thanks for the quick response! Looks like I'll need to come up with an
alternative solution in the meantime, since I'd like to avoid the other
input streams + WAL approach. :)
Thanks again,
Mark.
--
View this message in context:
On Mon, Feb 23, 2015 at 11:36 AM, Nicholas Chammas
nicholas.cham...@gmail.com wrote:
The first concern for Spark will probably be to ensure that we still build
and test against Python 2.6, since that's the minimum version of Python we
support.
sounds good... we can set up separate 2.6
On Sun, Feb 22, 2015 at 11:20 PM, Mark Hamstra m...@clearstorydata.com
wrote:
So what are we expecting of Hive 0.12.0 builds with this RC? I know not
every combination of Hadoop and Hive versions, etc., can be supported, but
even an example build from the Building Spark page isn't looking too
Nothing that I can point to, so this may only be a problem in test scope.
I am looking at a problem where some UDFs that run with 0.12 fail with
0.13; but that problem is already present in Spark 1.2.x, so it's not a
blocking regression for 1.3. (Very likely a HiveFunctionWrapper serde
problem,
+1 (non-binding)
For: https://issues.apache.org/jira/browse/SPARK-3660
. Docs OK
. Example code is good
-Soumitra.
On Mon, Feb 23, 2015 at 10:33 AM, Marcelo Vanzin van...@cloudera.com
wrote:
Hi Tom, are you using an sbt-built assembly by any chance? If so, take
a look at SPARK-5808.
I
Ah, sorry for not being clear enough.
So now in Spark 1.3.0, we have two Parquet support implementations, the
old one is tightly coupled with the Spark SQL framework, while the new
one is based on data sources API. In both versions, we try to intercept
operations over Parquet tables
Hey all,
I found a major issue where JobProgressListener (a listener used to keep
track of jobs for the web UI) never forgets stages in one of its data
structures. This is a blocker for long running applications.
https://issues.apache.org/jira/browse/SPARK-5967
I am testing a fix for this right
My bad, had once fixed all Hive 12 test failures in PR #4107, but didn't
got time to get it merged.
Considering the release is close, I can cherry-pick those Hive 12 fixes
from #4107 and open a more surgical PR soon.
Cheng
On 2/24/15 4:18 AM, Michael Armbrust wrote:
On Sun, Feb 22, 2015 at
23 matches
Mail list logo