Re: Clean out https://dist.apache.org/repos/dist/dev/spark/ ?

2019-01-24 Thread shane knapp
tomorrow i will continue the purge. :) On Thu, Jan 24, 2019 at 6:13 PM Sean Owen wrote: > No and we could retire 2.2 now too but wouldn't hurt to keep it a bit > longer in case we have to make a critical release even though it's EOL. > > On Thu, Jan 24, 2019, 7:05 PM shane knapp >> s/job/jobs

Re: Clean out https://dist.apache.org/repos/dist/dev/spark/ ?

2019-01-24 Thread shane knapp
s/job/jobs these are for the spark-(master|branch-X)-docs builds, so right now i am talking about removing 6 builds for the following branches: 1.6 2.0 2.1 2.3 2.4 master in fact, do we even need ANY builds for 1.6, 2.0 and 2.1? On Thu, Jan 24, 2019 at 5:57 PM Sean Owen wrote: > I think we

Re: Clean out https://dist.apache.org/repos/dist/dev/spark/ ?

2019-01-24 Thread Sean Owen
No and we could retire 2.2 now too but wouldn't hurt to keep it a bit longer in case we have to make a critical release even though it's EOL. On Thu, Jan 24, 2019, 7:05 PM shane knapp s/job/jobs > > these are for the spark-(master|branch-X)-docs builds, so right now i am > talking about removing

Re: Clean out https://dist.apache.org/repos/dist/dev/spark/ ?

2019-01-24 Thread Sean Owen
I think we can just remove this job. On Thu, Jan 24, 2019 at 6:44 PM shane knapp wrote: > > On Sun, Jan 13, 2019 at 11:22 AM Felix Cheung > wrote: >> >> Eh, yeah, like the one with signing, I think doc build is mostly useful when >> a) right before we do a release or during the RC resets; b)

Re: Clean out https://dist.apache.org/repos/dist/dev/spark/ ?

2019-01-24 Thread shane knapp
On Sun, Jan 13, 2019 at 11:22 AM Felix Cheung wrote: > Eh, yeah, like the one with signing, I think doc build is mostly useful > when a) right before we do a release or during the RC resets; b) someone > makes a huge change to doc and want to check > > Not sure we need this nightly? > > ohai! i

Re: moving the spark jenkins job builder repo from dbricks --> spark

2019-01-24 Thread shane knapp
looking here: https://dist.apache.org/repos/dist/dev/spark/3.0.0-SNAPSHOT-2019_01_24_10_34-69dab94-docs/ and here: https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-docs/5312/console this does confirm that these artifacts are indeed created by the packaging docs

Re: moving the spark jenkins job builder repo from dbricks --> spark

2019-01-24 Thread Sean Owen
Are these docs builds creating the SNAPSHOT docs builds at https://dist.apache.org/repos/dist/dev/spark/ ? I think from a thread last month, these aren't used and should probably just be stopped. On Thu, Jan 24, 2019 at 3:34 PM shane knapp wrote: > > revisiting this thread from october... sorry

Re: DSv2 question

2019-01-24 Thread Jungtaek Lim
I guess explaining rationalization would be better to understanding the situation. It's related to skip converting params to lowercase before assigning to Kafka parameter. (https://github.com/apache/spark/pull/23612) If we guarantee lowercase key on interface(s) we can simply pass them to Kafka

Re: moving the spark jenkins job builder repo from dbricks --> spark

2019-01-24 Thread shane knapp
revisiting this thread from october... sorry for the delay in getting around to this until now, but the jenkins job builder configs (and associated apache credentials stored in there) are *directly* related to the work i'm doing here: https://issues.apache.org/jira/browse/SPARK-26565

Re: DSv2 question

2019-01-24 Thread Joseph Torres
I wouldn't be opposed to also documenting that we canonicalize the keys as lowercase, but the case-insensitivity is I think the primary property. It's important to call out that data source developers don't have to worry about a semantic difference between option("mykey", "value") and

DSv2 question

2019-01-24 Thread Gabor Somogyi
Hi All, Given org.apache.spark.sql.sources.v2.DataSourceOptions which states the following: * An immutable string-to-string map in which keys are case-insensitive. This is used to represent * data source options. Case-insensitivity can be reached many ways.The implementation provides lowercase

Re: Missing SparkR in CRAN

2019-01-24 Thread Felix Cheung
Yes it was discussed on dev@. We are waiting for 2.3.3 to release to resubmit. On Thu, Jan 24, 2019 at 5:33 AM Hyukjin Kwon wrote: > Hi all, > > I happened to find SparkR is missing in CRAN. See > https://cran.r-project.org/web/packages/SparkR/index.html > > I remember I saw some threads about

Missing SparkR in CRAN

2019-01-24 Thread Hyukjin Kwon
Hi all, I happened to find SparkR is missing in CRAN. See https://cran.r-project.org/web/packages/SparkR/index.html I remember I saw some threads about this in spark-dev mailing list a long long ago IIRC. Is it in progress to fix it somewhere? or is it something I misunderstood?

Re: Reading compacted Kafka topic is slow

2019-01-24 Thread Gabor Somogyi
Hi Tomas, Presume the 60 sec window means trigger interval. Maybe a quick win could be to try structured streaming because there the trigger interval is optional. If it is not specified, the system will check for availability of new data as soon as the previous processing has completed. BR, G

Reading compacted Kafka topic is slow

2019-01-24 Thread Tomas Bartalos
Hello Spark folks, I'm reading compacted Kafka topic with spark 2.4, using direct stream - KafkaUtils.createDirectStream(...). I have configured necessary options for compacted stream, so its processed with CompactedKafkaRDDIterator. It works well, however in case of many gaps in the topic, the