the meaning of partition column and bucket column please?

2017-06-19 Thread ??????????
Hi all, The code of Column has member named isPartition and isBucket. What is the meanibg of them please? And when should set them as true please? Thank you advanced. Fei Shao

Re: the scheme in stream reader

2017-06-19 Thread ??????????
Hi , I have submitted a JIRA for this issue. The link is https://issues.apache.org/jira/browse/SPARK-21147 thanks Fei Shao ---Original--- From: "Michael Armbrust" Date: 2017/6/20 03:06:49 To: "??"<1427357...@qq.com>; Cc:

Unsubscribe

2017-06-19 Thread praba karan
Sent from Yahoo Mail on Android

Re: [build system] immediate emergency updates and reboot to deal w/stack clash vulnerability

2017-06-19 Thread shane knapp
i've updated the two ubuntu workers (amp-jenkins-staging-01 and -02), and am still twiddling my thumbs and waiting for centos packages to be released. i'm guessing we'll have those some time today, and will update everyone then. On Mon, Jun 19, 2017 at 11:02 AM, shane knapp

Re: cannot call explain or show on dataframe in structured streaming addBatch dataframe

2017-06-19 Thread Michael Armbrust
There is a little bit of weirdness to how we override the default query planner to replace it with an incrementalizing planner. As such, calling any operation that changes the query plan (such as a LIMIT) would cause it to revert to the batch planner and return the wrong answer. We should fix

Re: the scheme in stream reader

2017-06-19 Thread Michael Armbrust
The socket source can't know how to parse your data. I think the right thing would be for it to throw an exception saying that you can't set the schema here. Would you mind opening a JIRA ticket? If you are trying to parse data from something like JSON then you should use from_json` on the

Re: [build system] immediate emergency updates and reboot to deal w/stack clash vulnerability

2017-06-19 Thread shane knapp
ok, we're in a holding pattern as the centos packages haven't been released yet. once they're out i'll update this thread and start rebooting. On Mon, Jun 19, 2017 at 10:52 AM, shane knapp wrote: > jenkins is affected: > >

[build system] immediate emergency updates and reboot to deal w/stack clash vulnerability

2017-06-19 Thread shane knapp
jenkins is affected: https://www.qualys.com/2017/06/19/stack-clash/stack-clash.txt https://access.redhat.com/security/vulnerabilities/stackguard i'm shutting down jenkins, applying patches and rebooting immediately. ETA unknown. hopefully quick. i'll update here when i find out.

Re: Question: why is Externalizable used?

2017-06-19 Thread Reynold Xin
I responded on the ticket. On Mon, Jun 19, 2017 at 2:36 AM, Sean Owen wrote: > Just wanted to call attention to this question, mostly because I'm curious: > https://github.com/apache/spark/pull/18343#issuecomment-309388668 > > Why is Externalizable (+ KryoSerializable) used

Re: Output Committers for S3

2017-06-19 Thread Ryan Blue
I agree, the problem is that Spark is trying to be safe and avoid the direct committer. We also modify Spark to avoid its logic. We added a property that causes Spark to always use the output committer if the destination is in S3. Our committers are also slightly different and will get an

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-19 Thread Liang-Chi Hsieh
I mean it is not a bug has been fixed before this feature added. Of course kryo serializer with 2000+ partitions are working before this feature. Koert Kuipers wrote > If a feature added recently breaks using kryo serializer with 2000+ > partitions then how can it not be a regression? I mean I

Unsubscribe

2017-06-19 Thread vijendra rana

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-19 Thread Koert Kuipers
If a feature added recently breaks using kryo serializer with 2000+ partitions then how can it not be a regression? I mean I use kryo with more than 2000 partitions all the time, and it worked before. Or was I simply not hitting this bug because there are other conditions that also need to be

Question: why is Externalizable used?

2017-06-19 Thread Sean Owen
Just wanted to call attention to this question, mostly because I'm curious: https://github.com/apache/spark/pull/18343#issuecomment-309388668 Why is Externalizable (+ KryoSerializable) used instead of Serializable? and should the first two always go together?

Re: [VOTE] Apache Spark 2.2.0 (RC4)

2017-06-19 Thread Liang-Chi Hsieh
I think it's not. This is a feature added recently. Hyukjin Kwon wrote > Is this a regression BTW? I am just curious. > > On 19 Jun 2017 1:18 pm, "Liang-Chi Hsieh" > viirya@ > wrote: > > -1. When using kyro serializer and partition number is greater than 2000. > There seems a NPE issue