No, that isn't necessarily enough to be considered a blocker. A blocker
would be something that would have large negative effects on a significant
number of people trying to run Spark. Arguably, something that prevents a
minority of Spark developers from running unit tests on one OS does not
Here is the fix https://github.com/apache/spark/pull/13868
From: Reynold Xin [mailto:r...@databricks.com]
Sent: Wednesday, June 22, 2016 6:43 PM
To: Ulanov, Alexander
Cc: Mark Hamstra ; Marcelo Vanzin
; dev@spark.apache.org
Hi All,
I have tried the spark sql of Spark branch-2.0 and countered an
unexpected problem:
Operation not allowed: ROW FORMAT DELIMITED is only compatible with
'textfile', not 'orc'(line 1, pos 0)
the sql is like:
CREATE TABLE IF NOT EXISTS test.test_orc
(
...
)
PARTITIONED BY (xxx)
ROW
Thank you Holden, I look forward to watching your talk!
On Wed, Jun 22, 2016 at 7:12 PM Holden Karau wrote:
> PySpark RDDs are (on the Java side) are essentially RDD of pickled objects
> and mostly (but not entirely) opaque to the JVM. It is possible (by using
> some
PySpark RDDs are (on the Java side) are essentially RDD of pickled objects
and mostly (but not entirely) opaque to the JVM. It is possible (by using
some internals) to pass a PySpark DataFrame to a Scala library (you may or
may not find the talk I gave at Spark Summit useful
Hi All,
I've developed a spark module in scala that I would like to add a python
port for. I want to be able to allow users to create a pyspark RDD and send
it to my system. I've been looking into the pyspark source code as well as
py4J and was wondering if there has been anything like this
Alex - if you have access to a windows box, can you fix the issue? I'm not
sure how many Spark contributors have windows boxes.
On Wed, Jun 22, 2016 at 5:56 PM, Ulanov, Alexander wrote:
> Spark Unit tests fail on Windows in Spark 2.0. It can be considered as
>
Spark Unit tests fail on Windows in Spark 2.0. It can be considered as blocker
since there are people that develop for Spark on Windows. The referenced issue
is indeed Minor and has nothing to do with unit tests.
From: Mark Hamstra [mailto:m...@clearstorydata.com]
Sent: Wednesday, June 22, 2016
It's also marked as Minor, not Blocker.
On Wed, Jun 22, 2016 at 4:07 PM, Marcelo Vanzin wrote:
> On Wed, Jun 22, 2016 at 4:04 PM, Ulanov, Alexander
> wrote:
> > -1
> >
> > Spark Unit tests fail on Windows. Still not resolved, though marked as
> >
On Wed, Jun 22, 2016 at 4:04 PM, Ulanov, Alexander
wrote:
> -1
>
> Spark Unit tests fail on Windows. Still not resolved, though marked as
> resolved.
To be pedantic, it's marked as a duplicate
(https://issues.apache.org/jira/browse/SPARK-15899), which doesn't
mean
SPARK-15893 is resolved as a duplicate of SPARK-15899. SPARK-15899 is
Unresolved.
On Wed, Jun 22, 2016 at 4:04 PM, Ulanov, Alexander wrote:
> -1
>
> Spark Unit tests fail on Windows. Still not resolved, though marked as
> resolved.
>
>
-1
Spark Unit tests fail on Windows. Still not resolved, though marked as resolved.
https://issues.apache.org/jira/browse/SPARK-15893
From: Reynold Xin [mailto:r...@databricks.com]
Sent: Tuesday, June 21, 2016 6:27 PM
To: dev@spark.apache.org
Subject: [VOTE] Release Apache Spark 2.0.0 (RC1)
+1
On Wed, Jun 22, 2016 at 1:07 PM, Kousuke Saruta
wrote:
> +1 (non-binding)
>
> On 2016/06/23 4:53, Reynold Xin wrote:
>
> +1 myself
>
>
> On Wed, Jun 22, 2016 at 12:19 PM, Sean McNamara <
> sean.mcnam...@webtrends.com> wrote:
>
>> +1
>>
>> On Jun 22, 2016, at 1:14
+1 (non-binding)
On 2016/06/23 4:53, Reynold Xin wrote:
+1 myself
On Wed, Jun 22, 2016 at 12:19 PM, Sean McNamara
> wrote:
+1
On Jun 22, 2016, at 1:14 PM, Michael Armbrust
+1
On Jun 22, 2016, at 1:14 PM, Michael Armbrust
> wrote:
+1
On Wed, Jun 22, 2016 at 11:33 AM, Jonathan Kelly
> wrote:
+1
On Wed, Jun 22, 2016 at 10:41 AM Tim Hunter
+1
On Wed, Jun 22, 2016 at 11:33 AM, Jonathan Kelly
wrote:
> +1
>
> On Wed, Jun 22, 2016 at 10:41 AM Tim Hunter
> wrote:
>
>> +1 This release passes all tests on the graphframes and tensorframes
>> packages.
>>
>> On Wed, Jun 22, 2016 at 7:19
+1
On Wed, Jun 22, 2016 at 10:41 AM Tim Hunter
wrote:
> +1 This release passes all tests on the graphframes and tensorframes
> packages.
>
> On Wed, Jun 22, 2016 at 7:19 AM, Cody Koeninger
> wrote:
>
>> If we're considering backporting changes for
You should see at it both levels: there is one bloom filter for Orc data and
one for data in-memory.
It is already a good step towards an integration of format and in-memory
representation for columnar data.
> On 22 Jun 2016, at 14:01, BaiRan wrote:
>
> After building
+1 This release passes all tests on the graphframes and tensorframes
packages.
On Wed, Jun 22, 2016 at 7:19 AM, Cody Koeninger wrote:
> If we're considering backporting changes for the 0.8 kafka
> integration, I am sure there are people who would like to get
>
>
Yeah, I am +1 for including Kafka 0.10 integration as well. We had to wait
for Kafka 0.10 because there were incompatibilities between the Kafka 0.9
and 0.10 API. And, yes, the code for 0.8.0 remains unchanged so there
shouldn't be any regression for existing users. It's only new code for 0.10.
of course, on my first day back from vacation, i notice that the
jenkins process got wedged immediately upon my visiting the page.
one quick jenkins/httpd restart later and we're back up and building.
sorry for any inconvenience!
shane
+1 for 0.10 support. this is huge.
On Wed, Jun 22, 2016 at 8:17 AM, Cody Koeninger wrote:
> Luciano knows there are publicly available examples of how to use the
> 0.10 connector, including TLS support, because he asked me about it
> and I gave him a link
>
>
>
Luciano knows there are publicly available examples of how to use the
0.10 connector, including TLS support, because he asked me about it
and I gave him a link
https://github.com/koeninger/kafka-exactly-once/blob/kafka-0.9/src/main/scala/example/TlsStream.scala
If any committer at any time had
On Wed, Jun 22, 2016 at 7:46 AM, Cody Koeninger wrote:
> As far as I know the only thing blocking it at this point is lack of
> committer review / approval.
>
> It's technically adding a new feature after spark code-freeze, but it
> doesn't change existing code, and the kafka
Hm, I thought that was to be added for 2.0. Imran I know you may have
been working alongside Mark on it; what do you think?
TD / Reynold would you object to it for 2.0?
On Wed, Jun 22, 2016 at 3:46 PM, Cody Koeninger wrote:
> As far as I know the only thing blocking it at
As far as I know the only thing blocking it at this point is lack of
committer review / approval.
It's technically adding a new feature after spark code-freeze, but it
doesn't change existing code, and the kafka project didn't release
0.10 until the end of may.
On Wed, Jun 22, 2016 at 9:39 AM,
I profess ignorance again though I really should know by now, but,
what's opposing that? I personally thought this was going to be in 2.0
and didn't kind of notice it wasn't ...
On Wed, Jun 22, 2016 at 3:29 PM, Cody Koeninger wrote:
> I don't have a vote, but I'd just like to
For the clueless (like me):
https://bahir.apache.org/#home
Apache Bahir provides extensions to distributed analytic platforms such as
Apache Spark.
Initially Apache Bahir will contain streaming connectors that were a part
of Apache Spark prior to version 2.0:
- streaming-akka
-
I don't have a vote, but I'd just like to reiterate that I think kafka
0.10 support should be added to a 2.0 release candidate; if not now,
then well before release.
- it's a completely standalone jar, so shouldn't break anyone who's
using the existing 0.8 support
- it's like the 5th highest
Created a JIRA issue https://issues.apache.org/jira/browse/SPARK-16131 and
PR @ https://github.com/apache/spark/pull/13842
On Fri, Jun 17, 2016 at 5:19 AM, Sean Owen wrote:
> I think that's OK to change, yes. I don't see why it's necessary to
> init log_ the way it is now.
Good call, probably worth back-porting, I'll try to do that. I don't
think it blocks a release, but would be good to get into a next RC if
any.
On Wed, Jun 22, 2016 at 11:38 AM, Pete Robbins wrote:
> This has failed on our 1.6 stream builds regularly.
>
After building bloom filter on existing data, does spark engine utilise bloom
filter during query processing?
Is there any plan about predicate push down by using bloom filter in ORC /
Parquet?
Thanks
Ran
> On 22 Jun, 2016, at 10:48 am, Reynold Xin wrote:
>
> SPARK-12818
Hi All,
I am running Spark Application with 1.8TB of data (which is stored in Hive
tables format). I am reading the data using HiveContect and processing it.
The cluster has 5 nodes total, 25 cores per machine and 250Gb per node. I
am launching the application with 25 executors with 5 cores each
This has failed on our 1.6 stream builds regularly. (
https://issues.apache.org/jira/browse/SPARK-6005) looks fixed in 2.0?
On Wed, 22 Jun 2016 at 11:15 Sean Owen wrote:
> Oops, one more in the "does anybody else see this" department:
>
> - offset recovery *** FAILED ***
>
Oops, one more in the "does anybody else see this" department:
- offset recovery *** FAILED ***
recoveredOffsetRanges.forall(((or: (org.apache.spark.streaming.Time,
Array[org.apache.spark.streaming.kafka.OffsetRange])) =>
I'm fairly convinced this error and others that appear timestamp
related are an environment problem. This test and method have been
present for several Spark versions, without change. I reviewed the
logic and it seems sound, explicitly setting the time zone correctly.
I am not sure why it behaves
36 matches
Mail list logo