[jira] [Updated] (SPARK-26918) All .md should have ASF license header

2019-02-18 Thread Felix Cheung (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-26918:
-
Description: 
per policy, all md files should have the header, like eg. 
[https://raw.githubusercontent.com/apache/arrow/master/docs/README.md]

 or

[https://raw.githubusercontent.com/apache/hadoop/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/filesystem.md]

 

currently it does not

[https://raw.githubusercontent.com/apache/spark/master/docs/sql-reference.md] 

  was:
per policy, all md files should have the header, like eg. 
[https://raw.githubusercontent.com/apache/arrow/master/docs/README.md]

 

or

 

[https://raw.githubusercontent.com/apache/hadoop/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/filesystem.md]

 

currently it does not

[https://raw.githubusercontent.com/apache/spark/master/docs/sql-reference.md] 


> All .md should have ASF license header
> --
>
> Key: SPARK-26918
> URL: https://issues.apache.org/jira/browse/SPARK-26918
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Felix Cheung
>Priority: Major
>
> per policy, all md files should have the header, like eg. 
> [https://raw.githubusercontent.com/apache/arrow/master/docs/README.md]
>  or
> [https://raw.githubusercontent.com/apache/hadoop/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/filesystem.md]
>  
> currently it does not
> [https://raw.githubusercontent.com/apache/spark/master/docs/sql-reference.md] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26910) Re-release SparkR to CRAN

2019-02-18 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16771496#comment-16771496
 ] 

Felix Cheung commented on SPARK-26910:
--

2.3.3 has been submitted to CRAN. we are currently waiting for test result.

> Re-release SparkR to CRAN
> -
>
> Key: SPARK-26910
> URL: https://issues.apache.org/jira/browse/SPARK-26910
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Affects Versions: 2.4.0
>Reporter: Michael Chirico
>Priority: Major
>
> The logical successor to https://issues.apache.org/jira/browse/SPARK-15799
> I don't see anything specifically tracking re-release in the Jira list. It 
> would be helpful to have an issue tracking this to refer to as an outsider, 
> as well as to document what the blockers are in case some outside help could 
> be useful.
>  * Is there a plan to re-release SparkR to CRAN?
>  * What are the major blockers to doing so at the moment?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26910) Re-release SparkR to CRAN

2019-02-18 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16771498#comment-16771498
 ] 

Felix Cheung commented on SPARK-26910:
--

once that works we should look into 2.4.1

> Re-release SparkR to CRAN
> -
>
> Key: SPARK-26910
> URL: https://issues.apache.org/jira/browse/SPARK-26910
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Affects Versions: 2.4.0
>Reporter: Michael Chirico
>    Assignee: Felix Cheung
>Priority: Major
>
> The logical successor to https://issues.apache.org/jira/browse/SPARK-15799
> I don't see anything specifically tracking re-release in the Jira list. It 
> would be helpful to have an issue tracking this to refer to as an outsider, 
> as well as to document what the blockers are in case some outside help could 
> be useful.
>  * Is there a plan to re-release SparkR to CRAN?
>  * What are the major blockers to doing so at the moment?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26910) Re-release SparkR to CRAN

2019-02-18 Thread Felix Cheung (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung reassigned SPARK-26910:


Assignee: Felix Cheung

> Re-release SparkR to CRAN
> -
>
> Key: SPARK-26910
> URL: https://issues.apache.org/jira/browse/SPARK-26910
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Affects Versions: 2.4.0
>Reporter: Michael Chirico
>    Assignee: Felix Cheung
>Priority: Major
>
> The logical successor to https://issues.apache.org/jira/browse/SPARK-15799
> I don't see anything specifically tracking re-release in the Jira list. It 
> would be helpful to have an issue tracking this to refer to as an outsider, 
> as well as to document what the blockers are in case some outside help could 
> be useful.
>  * Is there a plan to re-release SparkR to CRAN?
>  * What are the major blockers to doing so at the moment?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26858) Vectorized gapplyCollect, Arrow optimization in native R function execution

2019-02-18 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16771494#comment-16771494
 ] 

Felix Cheung commented on SPARK-26858:
--

If I understand, this is the case where Spark actually doesn't care much about 
the schema but sounds like Arrow does.

Could we infer the schema from R data.frame? Is there an equivalent way for 
Python Pandas to Arrow?

> Vectorized gapplyCollect, Arrow optimization in native R function execution
> ---
>
> Key: SPARK-26858
> URL: https://issues.apache.org/jira/browse/SPARK-26858
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR, SQL
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>
> Unlike gapply, gapplyCollect requires additional ser/de steps because it can 
> omit the schema, and Spark SQL doesn't know the return type before actually 
> execution happens.
> In original code path, it's done via using binary schema. Once gapply is done 
> (SPARK-26761). we can mimic this approach in vectorized gapply to support 
> gapplyCollect.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: SparkR + binary type + how to get value

2019-02-17 Thread Felix Cheung
A byte buffer in R is the raw vector type, so seems like it is working as 
expected. What do you have in the raw byte? You could convert into other types 
or access individual byte directly...

https://stat.ethz.ch/R-manual/R-devel/library/base/html/raw.html



From: Thijs Haarhuis 
Sent: Thursday, February 14, 2019 4:01 AM
To: Felix Cheung; user@spark.apache.org
Subject: Re: SparkR + binary type + how to get value

Hi Felix,
Sure..

I have the following code:

  printSchema(results)
  cat("\n\n\n")

  firstRow <- first(results)
  value <- firstRow$value

  cat(paste0("Value Type: '",typeof(value),"'\n\n\n"))
  cat(paste0("Value: '",value,"'\n\n\n"))

results is a Spark Data Frame here.

When I run this code the following is printed to console:

[cid:04497e3e-7983-488a-8516-5d2349778f03]

You can there is only a single column in this sdf of type binary
when I collect this value and print the type it prints it is a list.

Any idea how to get the actual value, or how to process the individual bytes?

Thanks
Thijs


From: Felix Cheung 
Sent: Thursday, February 14, 2019 5:31 AM
To: Thijs Haarhuis; user@spark.apache.org
Subject: Re: SparkR + binary type + how to get value

Please share your code



From: Thijs Haarhuis 
Sent: Wednesday, February 13, 2019 6:09 AM
To: user@spark.apache.org
Subject: SparkR + binary type + how to get value


Hi all,



Does anybody have any experience in accessing the data from a column which has 
a binary type in a Spark Data Frame in R?

I have a Spark Data Frame which has a column which is of a binary type. I want 
to access this data and process it.

In my case I collect the spark data frame to a R data frame and access the 
first row.

When I print this row to the console it does print all the hex values correctly.



However when I access the column it prints it is a list of 1 …when I print the 
type of the child element..it again prints it is a list.

I expected this value to be of a raw type.



Anybody has some experience with this?



Thanks

Thijs




Re: [DISCUSS] Deprecate support for Spark 2.2.x and earlier version

2019-02-15 Thread Felix Cheung
+1



From: Jeff Zhang 
Sent: Thursday, February 14, 2019 10:28 PM
To: users
Subject: [DISCUSS] Deprecate support for Spark 2.2.x and earlier version

Hi Folks,

Spark 2.2.x will be EOL[1] from January of 2019. So I am considering to 
deprecate support for spark 2.2.x and earlier version in Zeppelin 0.9.0. 
Deprecation means that from Zeppelin 0.9 user is still able to run spark 2.2.x 
and earlier version, but will see a warning message in frontend about this 
deprecation. And in the next major version(maybe 0.10, or 1.0), we would 
totally remove support for spark 2.2.x and earlier version. The impact for 
users is the deprecation message in frontend. It may cause issues for users if 
they use rest api of zeppelin to run paragraph, then fetch and parse the result.

Let me know your concern about this. Thanks

[1] https://spark.apache.org/versioning-policy.html


--
Best Regards

Jeff Zhang


[jira] [Commented] (SPARK-26855) SparkSubmitSuite fails on a clean build

2019-02-13 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16767935#comment-16767935
 ] 

Felix Cheung commented on SPARK-26855:
--

possibly. sounds like there are more cases like this, and not just R

> SparkSubmitSuite fails on a clean build
> ---
>
> Key: SPARK-26855
> URL: https://issues.apache.org/jira/browse/SPARK-26855
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SparkR
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Felix Cheung
>Priority: Major
>
> SparkSubmitSuite
> "include an external JAR in SparkR"
> fails consistently but the test before it, "correctly builds R packages 
> included in a jar with --packages" passes.
> the workaround is to build once with skipTests first, then everything passes.
> ran into this while testing 2.3.3 RC2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: SparkR + binary type + how to get value

2019-02-13 Thread Felix Cheung
Please share your code



From: Thijs Haarhuis 
Sent: Wednesday, February 13, 2019 6:09 AM
To: user@spark.apache.org
Subject: SparkR + binary type + how to get value

Hi all,

Does anybody have any experience in accessing the data from a column which has 
a binary type in a Spark Data Frame in R?
I have a Spark Data Frame which has a column which is of a binary type. I want 
to access this data and process it.
In my case I collect the spark data frame to a R data frame and access the 
first row.
When I print this row to the console it does print all the hex values correctly.

However when I access the column it prints it is a list of 1 …when I print the 
type of the child element..it again prints it is a list.
I expected this value to be of a raw type.

Anybody has some experience with this?

Thanks
Thijs



Re: [VOTE] Accept Training into the Apache Incubator

2019-02-13 Thread Felix Cheung
+1

On Wed, Feb 13, 2019 at 7:23 AM Matt Sicker  wrote:

> +1
>
> Would be interesting if this project includes training for developing
> Apache projects in any way, or would that make more sense under ComDev
> or some other project?
>
> On Wed, 13 Feb 2019 at 09:09, Dmitriy Pavlov  wrote:
> >
> > +1 (non-binding)
> >
> > ср, 13 февр. 2019 г. в 18:05, Ciprian Borodescu <
> ciprian.borode...@gmail.com
> > >:
> >
> > > +1
> > >
> > > On Wed, Feb 13, 2019 at 4:52 PM Thomas Weise  wrote:
> > >
> > > > +1 (binding)
> > > >
> > > >
> > > > On Wed, Feb 13, 2019, 6:40 AM Mohammad Asif Siddiqui <
> > > > asifdxtr...@apache.org>
> > > > wrote:
> > > >
> > > > > +1 (binding)
> > > > >
> > > > > Regards
> > > > > Asif
> > > > >
> > > > > On Wed, Feb 13, 2019 at 8:09 PM Julian Feinauer <
> > > > > j.feina...@pragmaticminds.de> wrote:
> > > > >
> > > > > > +1 (non-binding)
> > > > > > I really liked the idea from the start and probably will find
> some
> > > time
> > > > > to
> > > > > > contribute!
> > > > > >
> > > > > > Julian
> > > > > >
> > > > > > Am 13.02.19, 15:18 schrieb "Kevin A. McGrail" <
> kmcgr...@apache.org
> > > >:
> > > > > >
> > > > > > +1 Binding.  I'll also try again to get  Udacity, Udemy,
> > > Coursera,
> > > > > > Pluralsight involved now that this is going to a formal
> incubator
> > > > > > podling.
> > > > > > I am hoping once a domino falls, more will help.
> > > > > >
> > > > > > Regards,
> > > > > > KAM
> > > > > > --
> > > > > > Kevin A. McGrail
> > > > > > Member, Apache Software Foundation
> > > > > > Chair Emeritus Apache SpamAssassin Project
> > > > > > https://www.linkedin.com/in/kmcgrail - 703.798.0171
> > > <(703)%20798-0171>
> > > > > >
> > > > > >
> > > > > > On Wed, Feb 13, 2019 at 7:00 AM Vinayakumar B <
> > > > > vinayakum...@apache.org
> > > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > +1 (binding)
> > > > > > >
> > > > > > > -Vinay
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Feb 13, 2019 at 4:58 PM Hans-Peter Zorn <
> > > hz...@inovex.de
> > > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > +1 (non-binding)
> > > > > > > >
> > > > > > > > Looking forward to work on this!
> > > > > > > > Thanks,
> > > > > > > > Hans-Peter
> > > > > > > >
> > > > > > > > > Am 13.02.2019 um 08:57 schrieb Lars Francke <
> > > > > > lars.fran...@gmail.com>:
> > > > > > > > >
> > > > > > > > > Hi everyone,
> > > > > > > > >
> > > > > > > > > we've discussed the proposal for the Training project
> in
> > > [1]
> > > > > and
> > > > > > [2].
> > > > > > > The
> > > > > > > > > proposal itself can be found on the wiki[3].
> > > > > > > > >
> > > > > > > > > According to the Incubator rules[4] I'd like to call a
> vote
> > > > to
> > > > > > accept
> > > > > > > the
> > > > > > > > > new "Training" project as a podling in the Apache
> > > Incubator.
> > > > > > > > >
> > > > > > > > > A vote for accepting a new Apache Incubator podling is
> a
> > > > > > majority vote.
> > > > > > > > > Everyone is welcome to vote, only Incubator PMC member
> > > votes
> > > > > are
> > > > > > > binding.
> > > > > > > > > It would be helpful (but not required) if you could
> add a
> > > > > comment
> > > > > > > stating
> > > > > > > > > whether your vote is binding or non-binding.
> > > > > > > > >
> > > > > > > > > This vote will run for at least 72 hours (but I expect
> to
> > > > keep
> > > > > > it open
> > > > > > > > for
> > > > > > > > > longer). Please VOTE as follows:
> > > > > > > > >
> > > > > > > > > [ ] +1 Accept Training into the Apache Incubator
> > > > > > > > > [ ] +0 Abstain
> > > > > > > > > [ ] -1 Do not accept Training into the Apache Incubator
> > > > because
> > > > > > ...
> > > > > > > > >
> > > > > > > > > Thank you for everyone who decided to join in in the
> past
> > > > > > discussions!
> > > > > > > > > Lars
> > > > > > > > >
> > > > > > > > > [1] <
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://lists.apache.org/thread.html/5c00016b769135cc302bb2ce4e5f6bbfeeda933a07e9c38b5017d651@%3Cgeneral.incubator.apache.org%3E
> > > > > > > > >>
> > > > > > > > >
> > > > > > > > > [2] <
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://lists.apache.org/thread.html/9cb4d7eef73e0d526e0124944c3d37325aa892675351a1eed0a25de3@%3Cgeneral.incubator.apache.org%3E
> > > > > > > > >>
> > > > > > > > >
> > > > > > > > > [3] <
> > > > > https://wiki.apache.org/incubator/TrainingProposal#preview>
> > > > > > > > >
> > > > > > > > > [4] <
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> https://incubator.apache.org/policy/incubation.html#approval_of_proposal_by_sponsor
> > > > > >   

Re: Failing on deploying artifacts

2019-02-12 Thread Felix Cheung
Ah good. Is this documented as a step somewhere? (Just want to see if this 
needs to be clarified)



From: Seunghyun Lee 
Sent: Tuesday, February 12, 2019 7:14 PM
To: dev@pinot.apache.org
Subject: Re: Failing on deploying artifacts

This was because I haven't set up the access to nexus yet. I have filed the
ticket to INFRA https://issues.apache.org/jira/browse/INFRA-17840

Best,
Seunghyun

On Tue, Feb 12, 2019 at 6:10 PM Seunghyun Lee  wrote:

> Hi mentors,
>
> I am trying to upload our artifacts to Nexus staging repository. I was
> following the apache document. Have set up "~/.m2/settings.xml" file
> correctly. Since the job fails after uploading the "pinot-0.1.0.pom" file,
> I think that it's not the authentication issue.
>
> When I run "mvn release:perform", I get the following error:
>
> [INFO] [INFO] --- maven-deploy-plugin:2.8.2:deploy (default-deploy) @
> pinot ---
> [INFO] Uploading:
> https://repository.apache.org/service/local/staging/deploy/maven2/org/apache/pinot/pinot/0.1.0/pinot-0.1.0.pom
> [INFO] Progress (1): 2.0/45 kB
> [INFO] Progress (1): 4.1/45 kB
> [INFO] Progress (1): 6.1/45 kB
> [INFO] Progress (1): 8.2/45 kB
> [INFO]
> 
>
> ~/workspace/pinot 0.1.0
> [INFO] [INFO] Total time: 6.280 s
> [INFO] [INFO] Finished at: 2019-02-12T18:05:20-08:00
> [INFO] [INFO] Final Memory: 44M/1168M
> [INFO] [INFO]
> 
> [INFO] [ERROR] Failed to execute goal
> org.apache.maven.plugins:maven-deploy-plugin:2.8.2:deploy (default-deploy)
> on project pinot: Failed to deploy artifacts: Could not transfer artifact
> org.apache.pinot:pinot:pom:0.1.0 from/to apache.releases.https (
> https://repository.apache.org/service/local/staging/deploy/maven2): *Failed
> to transfer file:
> https://repository.apache.org/service/local/staging/deploy/maven2/org/apache/pinot/pinot/0.1.0/pinot-0.1.0.pom
> .
> Return code is: 400, ReasonPhrase: Bad Request. *-> [Help 1]
> [INFO] [ERROR]
> [INFO] [ERROR] To see the full stack trace of the errors, re-run Maven
> with the -e switch.
> [INFO] [ERROR] Re-run Maven using the -X switch to enable full debug
> logging.
> [INFO] [ERROR]
> [INFO] [ERROR] For more information about the errors and possible
> solutions, please read the following articles:
> [INFO] [ERROR] [Help 1]
> http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
>
> Is someone familiar with maven-release tool and have face about this
> issue? I have spent hours on googling but haven't found the good solution
> yet.
>
> Best,
> Seunghyun
>


[jira] [Commented] (SPARK-26855) SparkSubmitSuite fails on a clean build

2019-02-11 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16765759#comment-16765759
 ] 

Felix Cheung commented on SPARK-26855:
--

IMO we have two options:
 # document tests only pass after a clean build with skipTests
 # re-order tests. suppose test A depends on module B is built, we could move 
test A to be after B (or rather, simply be the test of B)

> SparkSubmitSuite fails on a clean build
> ---
>
> Key: SPARK-26855
> URL: https://issues.apache.org/jira/browse/SPARK-26855
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SparkR
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Felix Cheung
>Priority: Major
>
> SparkSubmitSuite
> "include an external JAR in SparkR"
> fails consistently but the test before it, "correctly builds R packages 
> included in a jar with --packages" passes.
> the workaround is to build once with skipTests first, then everything passes.
> ran into this while testing 2.3.3 RC2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: java.lang.IllegalArgumentException: Unsupported class file major version 55

2019-02-10 Thread Felix Cheung
And it might not work completely. Spark only officially supports JDK 8.

I’m not sure if JDK 9 and + support is complete?



From: Jungtaek Lim 
Sent: Thursday, February 7, 2019 5:22 AM
To: Gabor Somogyi
Cc: Hande, Ranjit Dilip (Ranjit); user@spark.apache.org
Subject: Re: java.lang.IllegalArgumentException: Unsupported class file major 
version 55

ASM 6 doesn't support Java 11. In master branch (for Spark 3.0) there's 
dependency upgrade on ASM 7 and also some efforts (if my understanding is 
right) to support Java 11, so you may need to use lower version of JDK (8 
safest) for Spark 2.4.0, and try out master branch for preparing Java 11.

Thanks,
Jungtaek Lim (HeartSaVioR)

2019년 2월 7일 (목) 오후 9:18, Gabor Somogyi 
mailto:gabor.g.somo...@gmail.com>>님이 작성:
Hi Hande,

"Unsupported class file major version 55" means java incompatibility.
This error means you're trying to load a Java "class" file that was compiled 
with a newer version of Java than you have installed.
For example, your .class file could have been compiled for JDK 8, and you're 
trying to run it with JDK 7.
Are you sure 11 is the only JDK which is the default?

Small number of peoples playing with JDK 11 but not heavily tested and used.
Spark may or may not work but not suggested for production in general.

BR,
G


On Thu, Feb 7, 2019 at 12:53 PM Hande, Ranjit Dilip (Ranjit) 
mailto:ha...@avaya.com>> wrote:
Hi,

I am developing one java process which will consume data from Kafka using 
Apache Spark Streaming.
For this I am using following:

Java:
openjdk version "11.0.1" 2018-10-16 LTS
OpenJDK Runtime Environment Zulu11.2+3 (build 11.0.1+13-LTS) OpenJDK 64-Bit 
Server VM Zulu11.2+3 (build 11.0.1+13-LTS, mixed mode)

Maven: (Spark Streaming)

org.apache.spark
spark-streaming-kafka-0-10_2.11
2.4.0


org.apache.spark
spark-streaming_2.11
2.4.0


I am able to compile project successfully but when I try to run I get following 
error:

{"@timestamp":"2019-02-07T11:54:30.624+05:30","@version":"1","message":"Application
 run 
failed","logger_name":"org.springframework.boot.SpringApplication","thread_name":"main","level":"ERROR","level_value":4,"stack_trace":"java.lang.IllegalStateException:
 Failed to execute CommandLineRunner at 
org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:816)
 at 
org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:797)
 at org.springframework.boot.SpringApplication.run(SpringApplication.java:324) 
at 
com.avaya.measures.AgentMeasures.AgentMeasuresApplication.main(AgentMeasuresApplication.java:41)
 at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method) at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.base/java.lang.reflect.Method.invoke(Method.java:566) at 
org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:48) 
at org.springframework.boot.loader.Launcher.launch(Launcher.java:87) at

org.springframework.boot.loader.Launcher.launch(Launcher.java:50) at 
org.springframework.boot.loader.JarLauncher.main(JarLauncher.java:51)\r\nCaused 
by: java.lang.IllegalArgumentException: Unsupported class file major version 55 
at

 org.apache.xbean.asm6.ClassReader.(ClassReader.java:166) at 
org.apache.xbean.asm6.ClassReader.(ClassReader.java:148) at 
org.apache.xbean.asm6.ClassReader.(ClassReader.java:136) at 
org.apache.xbean.asm6.ClassReader.(ClassReader.java:237) at 
org.apache.spark.util.ClosureCleaner$.getClassReader(ClosureCleaner.scala:49) 
at 
org.apache.spark.util.FieldAccessFinder$$anon$3$$anonfun$visitMethodInsn$2.apply(ClosureCleaner.scala:517)
 at 
org.apache.spark.util.FieldAccessFinder$$anon$3$$anonfun$visitMethodInsn$2.apply(ClosureCleaner.scala:500)
 at 
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
 at 
scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:134)
 at 
scala.collection.mutable.HashMap$$anon$1$$anonfun$foreach$2.apply(HashMap.scala:134)
 at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:236) 
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40) at 
scala.collection.mutable.HashMap$$anon$1.foreach(HashMap.scala:134) at 
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:732) 
at 
org.apache.spark.util.FieldAccessFinder$$anon$3.visitMethodInsn(ClosureCleaner.scala:500)
 at org.apache.xbean.asm6.ClassReader.readCode(ClassReader.java:2175) at 
org.apache.xbean.asm6.ClassReader.readMethod(ClassReader.java:1238) at 
org.apache.xbean.asm6.ClassReader.accept(ClassReader.java:631) at 
org.apache.xbean.asm6.ClassReader.accept(ClassReader.java:355) at 

Re: Vectorized R gapply[Collect]() implementation

2019-02-10 Thread Felix Cheung
This is super awesome!



From: Shivaram Venkataraman 
Sent: Saturday, February 9, 2019 8:33 AM
To: Hyukjin Kwon
Cc: dev; Felix Cheung; Bryan Cutler; Liang-Chi Hsieh; Shivaram Venkataraman
Subject: Re: Vectorized R gapply[Collect]() implementation

Those speedups look awesome! Great work Hyukjin!

Thanks
Shivaram

On Sat, Feb 9, 2019 at 7:41 AM Hyukjin Kwon  wrote:
>
> Guys, as continuation of Arrow optimization for R DataFrame to Spark 
> DataFrame,
>
> I am trying to make a vectorized gapply[Collect] implementation as an 
> experiment like vectorized Pandas UDFs
>
> It brought 820%+ performance improvement. See 
> https://github.com/apache/spark/pull/23746
>
> Please come and take a look if you're interested in R APIs :D. I have already 
> cc'ed some people I know but please come, review and discuss for both Spark 
> side and Arrow side.
>
> This Arrow optimization job is being done under 
> https://issues.apache.org/jira/browse/SPARK-26759 . Please feel free to take 
> one if anyone of you is interested in it.
>
> Thanks.


Re: Podlings not following ASF release policy

2019-02-10 Thread Felix Cheung
... so I think we need to provide an option here.

No question that any unauthorized release is not allowed. But what about
while voting for a release?

It sounds like httpd has the same problem since some of these listed
“releases” are failed vote attempts (and so not official releases)

>From a casual check this seems like a very common problem for many projects
under github.com/apache/ as all git tags of RC show up as “release” on
GitHub automatically, and there isn’t a documented way to turn off this
behavior.

What are our options?
a) do not push git tag for RC - it’s possible to vote on a git commit id
b) ok with RC git tag so long as they are pre-release on Github (and
annotated tag clearly stating not a release) and remove as soon as possible
(eg new RC, vote pass)
c) get GitHub to fix this



On Fri, Feb 8, 2019 at 11:59 PM Daniel Gruno  wrote:

> On 2/9/19 8:50 AM, Justin Mclean wrote:
> > HI,
> >
> >>  From what I can see with httpd, the issue is the same(?)
> >
> > Not quite as I not see any release candidates listed.
>
> ah, well, httpd doesn't do release candidates :p we tag a release, vote
> on it, if it fails, you burn the version number and use a new one. so
> you'll see release "candidates" but just not recognize them unless you
> know which versions failed to get the proper votes.
>
> >
> > Thanks,
> > Justin
> >
> > -
> > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > For additional commands, e-mail: general-h...@incubator.apache.org
> >
>
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


Re: [VOTE] Release Apache Spark 2.3.3 (RC2)

2019-02-10 Thread Felix Cheung
+1
See note

Tested build from source and running tests.
Also tested SparkR basic - ran more tests in RC1 and checked there was no 
change in R since. So I’m ok with that.

Note:
1. Opened https://issues.apache.org/jira/browse/SPARK-26855 on the 
SparkSubmitSuite failure - (thanks to Sean’s tip) I don’t think it’s blocker.

2. Ran into a failure in HiveExternalCatalogVersionsSuite. But passed the 2nd 
ran (How reliable is archive.apache? It failed for
me before)
WARN org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite: Failed to 
download Spark 2.3.2 from 
https://archive.apache.org/dist/spark/spark-2.3.2/spark-2.3.2-bin-hadoop2.7.tgz:
 Socket closed
org.apache.spark.sql.hive.HiveExternalCatalogVersionsSuite *** ABORTED ***
Exception encountered when invoking run on a nested suite - Unable to download 
Spark 2.3.2 (HiveExternalCatalogVersionsSuite.scala:97)

3. There are a fair bit of changes in Python and SQL - someone should test that

4. Last time k8s integration tests is broken before it isn’t built by default. 
Could someone test with -Pkubernetes -Pkubernetes-integration-tests

SPARK-26482 broke the integration tests



From: John Zhuge 
Sent: Saturday, February 9, 2019 6:25 PM
To: Felix Cheung
Cc: Takeshi Yamamuro; Spark dev list
Subject: Re: [VOTE] Release Apache Spark 2.3.3 (RC2)

Not me. I am running zulu8, maven, and hadoop-2.7.

On Sat, Feb 9, 2019 at 5:42 PM Felix Cheung 
mailto:felixcheun...@hotmail.com>> wrote:
One test in SparkSubmitSuite is consistently failing for me. Anyone seeing that?



From: Takeshi Yamamuro mailto:linguin@gmail.com>>
Sent: Saturday, February 9, 2019 5:25 AM
To: Spark dev list
Subject: Re: [VOTE] Release Apache Spark 2.3.3 (RC2)

Sorry, but I forgot to check ` -Pdocker-integration-tests` for the JDBC 
integration tests.
I run these tests, and then I checked if they are passed.

On Sat, Feb 9, 2019 at 5:26 PM Herman van Hovell 
mailto:her...@databricks.com>> wrote:
I count 2 binding votes :)...

Op vr 8 feb. 2019 om 22:36 schreef Felix Cheung 
mailto:felixcheun...@hotmail.com>>
Nope, still only 1 binding vote ;)



From: Mark Hamstra mailto:m...@clearstorydata.com>>
Sent: Friday, February 8, 2019 7:30 PM
To: Marcelo Vanzin
Cc: Takeshi Yamamuro; Spark dev list
Subject: Re: [VOTE] Release Apache Spark 2.3.3 (RC2)

There are 2. C'mon Marcelo, you can make it 3!

On Fri, Feb 8, 2019 at 5:03 PM Marcelo Vanzin  
wrote:
Hi Takeshi,

Since we only really have one +1 binding vote, do you want to extend
this vote a bit?

I've been stuck on a few things but plan to test this (setting things
up now), but it probably won't happen before the deadline.

On Tue, Feb 5, 2019 at 5:07 PM Takeshi Yamamuro 
mailto:linguin@gmail.com>> wrote:
>
> Please vote on releasing the following candidate as Apache Spark version 
> 2.3.3.
>
> The vote is open until February 8 6:00PM (PST) and passes if a majority +1 
> PMC votes are cast, with
> a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 2.3.3
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.3.3-rc2 (commit 
> 66fd9c34bf406a4b5f86605d06c9607752bd637a):
> https://github.com/apache/spark/tree/v2.3.3-rc2
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.3.3-rc2-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1298/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.3.3-rc2-docs/
>
> The list of bug fixes going into 2.3.3 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12343759
>
> FAQ
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 2.3.3?
> ==

[jira] [Updated] (SPARK-26855) SparkSubmitSuite fails on a clean build

2019-02-10 Thread Felix Cheung (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-26855:
-
Description: 
SparkSubmitSuite

"include an external JAR in SparkR"

fails consistently but the test before it, "correctly builds R packages 
included in a jar with --packages" passes.

the workaround is to build once with skipTests first, then everything passes.

ran into this while testing 2.3.3 RC2.

> SparkSubmitSuite fails on a clean build
> ---
>
> Key: SPARK-26855
> URL: https://issues.apache.org/jira/browse/SPARK-26855
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SparkR
>Affects Versions: 2.3.2
>Reporter: Felix Cheung
>Priority: Major
>
> SparkSubmitSuite
> "include an external JAR in SparkR"
> fails consistently but the test before it, "correctly builds R packages 
> included in a jar with --packages" passes.
> the workaround is to build once with skipTests first, then everything passes.
> ran into this while testing 2.3.3 RC2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-26855) SparkSubmitSuite fails on a clean build

2019-02-10 Thread Felix Cheung (JIRA)
Felix Cheung created SPARK-26855:


 Summary: SparkSubmitSuite fails on a clean build
 Key: SPARK-26855
 URL: https://issues.apache.org/jira/browse/SPARK-26855
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, SparkR
Affects Versions: 2.3.2
Reporter: Felix Cheung






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 2.3.3 (RC2)

2019-02-09 Thread Felix Cheung
One test in SparkSubmitSuite is consistently failing for me. Anyone seeing that?



From: Takeshi Yamamuro 
Sent: Saturday, February 9, 2019 5:25 AM
To: Spark dev list
Subject: Re: [VOTE] Release Apache Spark 2.3.3 (RC2)

Sorry, but I forgot to check ` -Pdocker-integration-tests` for the JDBC 
integration tests.
I run these tests, and then I checked if they are passed.

On Sat, Feb 9, 2019 at 5:26 PM Herman van Hovell 
mailto:her...@databricks.com>> wrote:
I count 2 binding votes :)...

Op vr 8 feb. 2019 om 22:36 schreef Felix Cheung 
mailto:felixcheun...@hotmail.com>>
Nope, still only 1 binding vote ;)



From: Mark Hamstra mailto:m...@clearstorydata.com>>
Sent: Friday, February 8, 2019 7:30 PM
To: Marcelo Vanzin
Cc: Takeshi Yamamuro; Spark dev list
Subject: Re: [VOTE] Release Apache Spark 2.3.3 (RC2)

There are 2. C'mon Marcelo, you can make it 3!

On Fri, Feb 8, 2019 at 5:03 PM Marcelo Vanzin  
wrote:
Hi Takeshi,

Since we only really have one +1 binding vote, do you want to extend
this vote a bit?

I've been stuck on a few things but plan to test this (setting things
up now), but it probably won't happen before the deadline.

On Tue, Feb 5, 2019 at 5:07 PM Takeshi Yamamuro 
mailto:linguin@gmail.com>> wrote:
>
> Please vote on releasing the following candidate as Apache Spark version 
> 2.3.3.
>
> The vote is open until February 8 6:00PM (PST) and passes if a majority +1 
> PMC votes are cast, with
> a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 2.3.3
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.3.3-rc2 (commit 
> 66fd9c34bf406a4b5f86605d06c9607752bd637a):
> https://github.com/apache/spark/tree/v2.3.3-rc2
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.3.3-rc2-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1298/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.3.3-rc2-docs/
>
> The list of bug fixes going into 2.3.3 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12343759
>
> FAQ
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 2.3.3?
> ===
>
> The current list of open tickets targeted at 2.3.3 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target 
> Version/s" = 2.3.3
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
> P.S.
> I checked all the tests passed in the Amazon Linux 2 AMI;
> $ java -version
> openjdk version "1.8.0_191"
> OpenJDK Runtime Environment (build 1.8.0_191-b12)
> OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode)
> $ ./build/mvn -Pyarn -Phadoop-2.7 -Phive -Phive-thriftserver -Pmesos -Psparkr 
> test
>
> --
> ---
> Takeshi Yamamuro



--
Marcelo

-
To unsubscribe e-mail: 
dev-unsubscr...@spark.apache.org<mailto:dev-unsubscr...@spark.apache.org>



--
---
Takeshi Yamamuro


[jira] [Commented] (SPARK-26762) Arrow optimization for conversion from Spark DataFrame to R DataFrame

2019-02-08 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764092#comment-16764092
 ] 

Felix Cheung commented on SPARK-26762:
--

does this include head, take etc?

> Arrow optimization for conversion from Spark DataFrame to R DataFrame
> -
>
> Key: SPARK-26762
> URL: https://issues.apache.org/jira/browse/SPARK-26762
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR, SQL
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Like SPARK-25981, {{collect(rdf)}} can be optimized via Arrow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-26762) Arrow optimization for conversion from Spark DataFrame to R DataFrame

2019-02-08 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764092#comment-16764092
 ] 

Felix Cheung edited comment on SPARK-26762 at 2/9/19 7:35 AM:
--

does this include head, take, show etc?


was (Author: felixcheung):
does this include head, take etc?

> Arrow optimization for conversion from Spark DataFrame to R DataFrame
> -
>
> Key: SPARK-26762
> URL: https://issues.apache.org/jira/browse/SPARK-26762
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR, SQL
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Like SPARK-25981, {{collect(rdf)}} can be optimized via Arrow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: Podlings not following ASF release policy

2019-02-08 Thread Felix Cheung
I’d agree- it’s a “feature” of GitHub and I didn’t see a way to turn off
“turning a git tag into a release”

I see that there is a API to edit a release to say it is “pre-release”
https://developer.github.com/v3/repos/releases/#edit-a-release

IMO either way it should be ok because
a) the tag is clearly indicating *-rc0 *-rc1 and so on (if not ok, it can
be set as a prerelease)
b) when a new RC is being rolled, the last RC git tag can be removed (which
should also eliminate the github “release”)
c) when the RC becomes the official release, the git tag can be replaced
and the final git tag is the release


On Fri, Feb 8, 2019 at 11:14 PM Seunghyun Lee  wrote:

> Hi all,
>
> I'm Seunghyun who's working on the first Apache release for Pinot project.
>
> When I played around with maven-release-plugin today, I found out that
> Github would automatically generate the release at "
> github.com/apache/incubator-xxx/release" when we update the tag. I think
> that some projects may inadvertently create a release on Github for their
> release candidates.
>
> I'm a bit confused because we need to provide the tag as part of release
> process while tagging a release candidate creates a release on Github.
>
> If you refer to the Gobblin's release page [1], you can observe that
> release candidate was added to the release page before the official release
> got updated. I have looked into Github configurations, I haven't find one
> that prevents this yet.
>
> Best,
> Seunghyun
>
> [1] https://github.com/apache/incubator-gobblin/releases
>
>
> On Fri, Feb 8, 2019 at 7:57 PM Dave Fisher  wrote:
>
> > Hi Julian,
> >
> > A perfect example!
> >
> > I think we should add a checkbox to the podling report where the podling
> > indicates when they are fully in compliance with release policy. Until
> that
> > is done we don’t worry.
> >
> > Additionally, Infra could tag and “release” with appropriate description
> > when they move a GitHub repository into the foundation.
> >
> > Regards,
> > Dave
> >
> > Sent from my iPhone
> >
> > > On Feb 8, 2019, at 5:41 PM, Julian Hyde  wrote:
> > >
> > > I’m a mentor of Druid.
> > >
> > > We allowed Druid to continue making releases outside of Apache during
> > incubation because ASF releases were not possible. There were various
> > reasons - they could not release from main line because IP transfer had
> not
> > been completed (if I recall correctly), and they also needed to make
> > bug-fix releases of existing releases. Druid is an active project with
> > large installations in production, some of them at major companies;
> pausing
> > releases for 6 - 9 months while transitioning into ASF would have been
> > hugely damaging to the project and its community.
> > >
> > > The project tried to do everything by the book: they sought permission
> > for releases outside of ASF, disclosed the non-ASF releases in its
> reports,
> > and made an official Apache release as soon as they could. If there is
> > anything they could/should have done differently, let’s discuss, and
> write
> > down guidelines for future podlings that are in a similar situation.
> > >
> > > Julian
> > >
> > >
> > >
> > >> On Feb 8, 2019, at 5:16 PM, Justin Mclean 
> > wrote:
> > >>
> > >> Hi,
> > >>
> > >> One of the issues I’ve seen is that project continues to make releases
> > in GitHub after being accepted into the incubator, in some case is this
> > because the repo hasn’t been moved over yet, in other cases it’s because
> > they believe that the code base is not Apache ready. What should we do in
> > this situations? From what I seen it usually just delays transfer of the
> > repo and encourages unapproved releases. I would would push for mentors
> > speeding up that transfer rather than allowing unapproved releases. What
> do
> > others think?
> > >>
> > >> Thanks,
> > >> Justin
> > >> -
> > >> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > >> For additional commands, e-mail: general-h...@incubator.apache.org
> > >>
> > >
> > >
> > > -
> > > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > > For additional commands, e-mail: general-h...@incubator.apache.org
> > >
> >
> >
> > -
> > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > For additional commands, e-mail: general-h...@incubator.apache.org
> >
> >
>


Re: [DISCUSS] Change default executor log URLs for YARN

2019-02-08 Thread Felix Cheung
For this case I’d agree with Ryan. I haven’t followed this thread and the 
details of the change since it’s way too much for me to consume “in my free 
time” (which is 0 nowadays) but I’m pretty sure the existing behavior works for 
us and very likely we don’t want it to change because of some proxy magic we do 
behind the scene.

I’d also agree config flag is not always the best way but in this case the 
existing established behavior doesn’t seem broken...

I could be wrong though.



From: Ryan Blue 
Sent: Friday, February 8, 2019 4:39 PM
To: Sean Owen
Cc: Jungtaek Lim; dev
Subject: Re: [DISCUSS] Change default executor log URLs for YARN

I'm not sure that many people need this, so it is hard to make a decision. I'm 
reluctant to change the current behavior if the result is a new papercut to 99% 
of users and a win for 1%. The suggested change will work for 100% of users, so 
if we don't want a flag then we should go with that. But I would certainly want 
to turn it off in our environment because it doesn't provide any value for us 
and would annoy our users.

On Fri, Feb 8, 2019 at 4:18 PM Sean Owen 
mailto:sro...@gmail.com>> wrote:
Is a flag needed? You know me, I think flags are often failures of
design, or disagreement punted to the user. I can understand retaining
old behavior under a flag where the behavior change could be
problematic for some users or facilitate migration, but this is just a
change to some UI links no? the underlying links don't change.
On Fri, Feb 8, 2019 at 5:41 PM Ryan Blue 
mailto:rb...@netflix.com>> wrote:
>
> I suggest using the current behavior as the default and add a flag to 
> implement the behavior you're suggesting: to link to the logs path in YARN 
> instead of directly to stderr and stdout.
>
> On Fri, Feb 8, 2019 at 3:33 PM Jungtaek Lim 
> mailto:kabh...@gmail.com>> wrote:
>>
>> Ryan,
>>
>> actually I'm not clear about your suggestion. For me three possible options 
>> here:
>>
>> 1. If we want to let users be able to completely rewrite log urls, that's 
>> SPARK-26792. For SHS we already addressed it.
>> 2. We could let users turning on/off flag option to just get one url or 
>> default two stdout/stderr urls.
>> 3. We could let users enumerate file names they want to link, and create log 
>> links for each file.
>>
>> Which one do you suggest?
>


--
Ryan Blue
Software Engineer
Netflix


Re: [VOTE] Release Apache Spark 2.3.3 (RC2)

2019-02-08 Thread Felix Cheung
Nope, still only 1 binding vote ;)



From: Mark Hamstra 
Sent: Friday, February 8, 2019 7:30 PM
To: Marcelo Vanzin
Cc: Takeshi Yamamuro; Spark dev list
Subject: Re: [VOTE] Release Apache Spark 2.3.3 (RC2)

There are 2. C'mon Marcelo, you can make it 3!

On Fri, Feb 8, 2019 at 5:03 PM Marcelo Vanzin  
wrote:
Hi Takeshi,

Since we only really have one +1 binding vote, do you want to extend
this vote a bit?

I've been stuck on a few things but plan to test this (setting things
up now), but it probably won't happen before the deadline.

On Tue, Feb 5, 2019 at 5:07 PM Takeshi Yamamuro 
mailto:linguin@gmail.com>> wrote:
>
> Please vote on releasing the following candidate as Apache Spark version 
> 2.3.3.
>
> The vote is open until February 8 6:00PM (PST) and passes if a majority +1 
> PMC votes are cast, with
> a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 2.3.3
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.3.3-rc2 (commit 
> 66fd9c34bf406a4b5f86605d06c9607752bd637a):
> https://github.com/apache/spark/tree/v2.3.3-rc2
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.3.3-rc2-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1298/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.3.3-rc2-docs/
>
> The list of bug fixes going into 2.3.3 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12343759
>
> FAQ
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 2.3.3?
> ===
>
> The current list of open tickets targeted at 2.3.3 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target 
> Version/s" = 2.3.3
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
> P.S.
> I checked all the tests passed in the Amazon Linux 2 AMI;
> $ java -version
> openjdk version "1.8.0_191"
> OpenJDK Runtime Environment (build 1.8.0_191-b12)
> OpenJDK 64-Bit Server VM (build 25.191-b12, mixed mode)
> $ ./build/mvn -Pyarn -Phadoop-2.7 -Phive -Phive-thriftserver -Pmesos -Psparkr 
> test
>
> --
> ---
> Takeshi Yamamuro



--
Marcelo

-
To unsubscribe e-mail: 
dev-unsubscr...@spark.apache.org



Re: [MENTORS] Unapproved releases

2019-02-08 Thread Felix Cheung
I think this happens automatically by GitHub when a git tag is pushed?



From: kishore g 
Sent: Friday, February 8, 2019 3:34 PM
To: dev@pinot.apache.org
Subject: Re: [MENTORS] Unapproved releases

Thanks Justin. I will take a look at it. Olivier, we might need your
expertise here.

On Fri, Feb 8, 2019 at 2:24 PM Justin Mclean  wrote:

> Hi,
>
> Sorry to bother you again, but It seems this is not sinking in. I can see
> you have just marked a release candidate as a release here. [1] Mentors can
> you please deal with this.
>
> Thanks,
> Justin
>
> 1. https://github.com/apache/incubator-pinot/releases
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org
> For additional commands, e-mail: dev-h...@pinot.apache.org
>
>


Re: Mentor sign-off due next Tuesday (February 12th)

2019-02-07 Thread Felix Cheung
Done as well. Also updated the note about committer.



From: Olivier Lamy 
Sent: Wednesday, February 6, 2019 5:22 PM
To: dev@pinot.apache.org
Subject: Re: Mentor sign-off due next Tuesday (February 12th)

done!

On Thu, 7 Feb 2019 at 08:52, Seunghyun Lee  wrote:

> Dear Mentors,
>
> Can we get the sign-offs for our podling reports? It seems that it's due
> next Tuesday.
>
> Best,
> Seunghyun
>
> -- Forwarded message -
> From: Justin Mclean 
> Date: Wed, Feb 6, 2019 at 2:44 PM
> Subject: Mentor sign-off due next Tuesday (February 12th)
> To: 
>
>
> Hi,
>
> Mentor sign-off of the podling reports is due next Tuesday.
>
> So far these projects don’t have any sign-offs:
> - BRPC (missing report)
> - Doris
> - ECharts
> - Heron
> - Pinot
> - ShardingSphere
> - Tamaya
>
> Congratulations to SDAP where all mentors have signed off the report.
>
> Thanks,
> Justin
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>


--
Olivier Lamy
http://twitter.com/olamy | http://linkedin.com/in/olamy


ASF Release process

2019-02-06 Thread Felix Cheung
Sharing doc Chris has on the ASF release process

http://plc4x.apache.org/developers/release.html

Thanks Chris!


Re: Interest in release documentation?

2019-02-06 Thread Felix Cheung
Very cool. I am going to share this with another incubator project, I hope you 
don’t mind!



From: Christofer Dutz 
Sent: Tuesday, February 5, 2019 5:02 AM
To: dev@iotdb.apache.org
Subject: Interest in release documentation?

Hi all,

I know this is early, but you can never be early enough when it comes to 
releasing.
We just had our first successful release of PLC4X by a release manager that was 
not myself.
Intentionally I left him alone with only this documentation:

http://plc4x.apache.org/developers/release.html

And de succeeded, as he claimed: without any problems.

As I setup the IoTDB build structurally identical to that of PLC4X, this 
documentation should also apply for IoTDB (Haven’t checked the details though).

Especially the handling of the Apache-Specialties might be interesting for all 
of you, who haven’t done releases for Apache projects yet.

Chris


Re: Request for checking commits before creating a release candidate

2019-02-06 Thread Felix Cheung
For #1 LG generally. What happened to the whole chunk of Apache ones
like Apache
Commons Net?

#2 LGTM

On Tue, Feb 5, 2019 at 9:59 PM Seunghyun Lee  wrote:

> Hi Felix,
>
> Before we cut the release candidate, I have some remaining checks from you.
>
> 1. Can you double check on the update for LICENSE-binary/NOTICE-binary due
> to jersery version upgrade? (A lot of glassfish projects have been moved
> from CDDL to EPL 2.0.)
> https://github.com/apache/incubator-pinot/pull/3791
>
> 2. Can you also comment on my reply on the issue with removing "*-binary"
> files/directories? (refer pinot-assembly.xml file)
> https://github.com/apache/incubator-pinot/pull/3772
>
> I really appreciate your help.
>
> Best,
> Seunghyun
>


Re: Request to join iotdb mailing list

2019-02-04 Thread Felix Cheung
Please email dev-subscr...@iotdb.apache.org




From: Jack Liu 
Sent: Monday, February 4, 2019 12:23 PM
To: dev@iotdb.apache.org
Subject: Request to join iotdb mailing list

I Huaqing Liu, work in Microsoft and JinRui is my college, just start to
play IoTDB. Hope to join IoTDB dev mailing list and be a committer. This is
my GitHub URL .

Thanks
Huaqing Liu


Re: [DISCUSS] Upgrade built-in Hive to 2.3.4

2019-02-04 Thread Felix Cheung
Likely need a shim (which we should have anyway) because of namespace/import 
changes.

I’m huge +1 on this.



From: Hyukjin Kwon 
Sent: Monday, February 4, 2019 12:27 PM
To: Xiao Li
Cc: Sean Owen; Felix Cheung; Ryan Blue; Marcelo Vanzin; Yuming Wang; dev
Subject: Re: [DISCUSS] Upgrade built-in Hive to 2.3.4

I should check the details and feasiablity by myself but to me it sounds fine 
if it doesn't need extra big efforts.

On Tue, 5 Feb 2019, 4:15 am Xiao Li 
mailto:gatorsm...@gmail.com> wrote:
Yes. When our support/integration with Hive 2.x becomes stable, we can do it in 
Hadoop 2.x profile too, if needed. The whole proposal is to minimize the risk 
and ensure the release stability and quality.

Hyukjin Kwon mailto:gurwls...@gmail.com>> 于2019年2月4日周一 
下午12:01写道:
Xiao, to check if I understood correctly, do you mean the below?

1. Use our fork with Hadoop 2.x profile for now, and use Hive 2.x with Hadoop 
3.x profile.
2. Make another newer version of thrift server by Hive 2.x(?) in Spark side.
3. Target the transition to Hive 2.x completely and slowly later in the future.



2019년 2월 5일 (화) 오전 1:16, Xiao Li 
mailto:gatorsm...@gmail.com>>님이 작성:
To reduce the impact and risk of upgrading Hive execution JARs, we can just 
upgrade the built-in Hive to 2.x when using the profile of Hadoop 3.x. The 
support of Hadoop 3 will be still experimental in our next release. That means, 
the impact and risk are very minimal for most users who are still using Hadoop 
2.x profile.

The code changes in Spark thrift server are massive. It is risky and hard to 
review. The original code of our Spark thrift server is from Hive-service 
1.2.1. To reduce the risk of the upgrade, we can inline the new version. In the 
future, we can completely get rid of the thrift server, and build our own 
high-performant JDBC server.

Does this proposal sound good to you?

In the last two weeks, Yuming was trying this proposal. Now, he is on vacation. 
In China, today is already the lunar New Year. I would not expect he will reply 
this email in the next 7 days.

Cheers,

Xiao



Sean Owen mailto:sro...@gmail.com>> 于2019年2月4日周一 上午7:56写道:
I was unclear from this thread what the objection to these PRs is:

https://github.com/apache/spark/pull/23552
https://github.com/apache/spark/pull/23553

Would we like to specifically discuss whether to merge these or not? I
hear support for it, concerns about continuing to support Hive too,
but I wasn't clear whether those concerns specifically argue against
these PRs.


On Fri, Feb 1, 2019 at 2:03 PM Felix Cheung 
mailto:felixcheun...@hotmail.com>> wrote:
>
> What’s the update and next step on this?
>
> We have real users getting blocked by this issue.
>
>
> 
> From: Xiao Li mailto:gatorsm...@gmail.com>>
> Sent: Wednesday, January 16, 2019 9:37 AM
> To: Ryan Blue
> Cc: Marcelo Vanzin; Hyukjin Kwon; Sean Owen; Felix Cheung; Yuming Wang; dev
> Subject: Re: [DISCUSS] Upgrade built-in Hive to 2.3.4
>
> Thanks for your feedbacks!
>
> Working with Yuming to reduce the risk of stability and quality. Will keep 
> you posted when the proposal is ready.
>
> Cheers,
>
> Xiao
>
> Ryan Blue mailto:rb...@netflix.com>> 于2019年1月16日周三 
> 上午9:27写道:
>>
>> +1 for what Marcelo and Hyukjin said.
>>
>> In particular, I agree that we can't expect Hive to release a version that 
>> is now more than 3 years old just to solve a problem for Spark. Maybe that 
>> would have been a reasonable ask instead of publishing a fork years ago, but 
>> I think this is now Spark's problem.
>>
>> On Tue, Jan 15, 2019 at 9:02 PM Marcelo Vanzin 
>> mailto:van...@cloudera.com>> wrote:
>>>
>>> +1 to that. HIVE-16391 by itself means we're giving up things like
>>> Hadoop 3, and we're also putting the burden on the Hive folks to fix a
>>> problem that we created.
>>>
>>> The current PR is basically a Spark-side fix for that bug. It does
>>> mean also upgrading Hive (which gives us Hadoop 3, yay!), but I think
>>> it's really the right path to take here.
>>>
>>> On Tue, Jan 15, 2019 at 6:32 PM Hyukjin Kwon 
>>> mailto:gurwls...@gmail.com>> wrote:
>>> >
>>> > Resolving HIVE-16391 means Hive to release 1.2.x that contains the fixes 
>>> > of our Hive fork (correct me if I am mistaken).
>>> >
>>> > Just to be honest by myself and as a personal opinion, that basically 
>>> > says Hive to take care of Spark's dependency.
>>> > Hive looks going ahead for 3.1.x and no one would use the newer release 
>>> > of 1.2.x. In practice, Spark doesn't make a release 1.6.x anymore for 
>>> > instance,
>>> >

[jira] [Assigned] (SPARK-26603) Update minikube backend in K8s integration tests

2019-02-03 Thread Felix Cheung (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung reassigned SPARK-26603:


Assignee: Stavros Kontopoulos

> Update minikube backend in K8s integration tests
> 
>
> Key: SPARK-26603
> URL: https://issues.apache.org/jira/browse/SPARK-26603
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Stavros Kontopoulos
>Assignee: Stavros Kontopoulos
>Priority: Major
>
> Minikube status command has changed 
> ([https://github.com/kubernetes/minikube/commit/cb3624dd089e7ab0c03fbfb81f20c2bde43a60f3#diff-bd0534bbb0703b4170d467d074373788])
>  in the latest releases >0.30.
> Old output:
> {quote}minikube status
>  There is a newer version of minikube available (v0.31.0). Download it here:
>  [https://github.com/kubernetes/minikube/releases/tag/v0.31.0]
> To disable this notification, run the following:
>  minikube config set WantUpdateNotification false
>  minikube: 
>  cluster: 
>  kubectl: 
> {quote}
> new output:
> {quote}minikube status
>  host: Running
>  kubelet: Running
>  apiserver: Running
>  kubectl: Correctly Configured: pointing to minikube-vm at 172.31.34.77
> {quote}
> That means users with latest version of minikube will not be able to run the 
> integration tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-26603) Update minikube backend in K8s integration tests

2019-02-03 Thread Felix Cheung (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung resolved SPARK-26603.
--
   Resolution: Fixed
Fix Version/s: 3.0.0

> Update minikube backend in K8s integration tests
> 
>
> Key: SPARK-26603
> URL: https://issues.apache.org/jira/browse/SPARK-26603
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Stavros Kontopoulos
>Assignee: Stavros Kontopoulos
>Priority: Major
> Fix For: 3.0.0
>
>
> Minikube status command has changed 
> ([https://github.com/kubernetes/minikube/commit/cb3624dd089e7ab0c03fbfb81f20c2bde43a60f3#diff-bd0534bbb0703b4170d467d074373788])
>  in the latest releases >0.30.
> Old output:
> {quote}minikube status
>  There is a newer version of minikube available (v0.31.0). Download it here:
>  [https://github.com/kubernetes/minikube/releases/tag/v0.31.0]
> To disable this notification, run the following:
>  minikube config set WantUpdateNotification false
>  minikube: 
>  cluster: 
>  kubectl: 
> {quote}
> new output:
> {quote}minikube status
>  host: Running
>  kubelet: Running
>  apiserver: Running
>  kubectl: Correctly Configured: pointing to minikube-vm at 172.31.34.77
> {quote}
> That means users with latest version of minikube will not be able to run the 
> integration tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: Can we blacklist the mails from GitBox for dev mailing list?

2019-02-02 Thread Felix Cheung
This hasn’t changed, I think.

Seunghyun, you can open a JIRA on INFRA to request that.



From: Seunghyun Lee 
Sent: Wednesday, January 30, 2019 10:25 PM
To: Daniel Gruno
Cc: us...@infra.apache.org; dev@pinot.apache.org
Subject: Re: Can we blacklist the mails from GitBox for dev mailing list?

We do have com...@pinot.apache.org, which is already getting a lot of mails
for commits to the repository. How about moving it there?

On Wed, Jan 30, 2019 at 10:12 PM Daniel Gruno  wrote:

> On 1/31/19 7:04 AM, Seunghyun Lee wrote:
> > Hi Apache infra team,
> >
> > I'm Seunghyun who's working on Pinot incubating project. We are getting
> > a lot of spam mails from GitBox on all changes in Github and this makes
> > the mail archive hard to find the useful discussion. Can we blacklist
> > GitBox from sending mails to dev@pinot.apache.org
> > ?
>
> No, but you could set up a new mailing list just for PRs/issues and have
> it moved there.
>
> >
> > Please refer tot he following mailing list.
> > https://lists.apache.org/list.html?dev@pinot.apache.org
> >
> > Best,
> > Seunghyun
>
>


Re: [DISCUSS] Upgrade built-in Hive to 2.3.4

2019-02-01 Thread Felix Cheung
What’s the update and next step on this?

We have real users getting blocked by this issue.



From: Xiao Li 
Sent: Wednesday, January 16, 2019 9:37 AM
To: Ryan Blue
Cc: Marcelo Vanzin; Hyukjin Kwon; Sean Owen; Felix Cheung; Yuming Wang; dev
Subject: Re: [DISCUSS] Upgrade built-in Hive to 2.3.4

Thanks for your feedbacks!

Working with Yuming to reduce the risk of stability and quality. Will keep you 
posted when the proposal is ready.

Cheers,

Xiao

Ryan Blue mailto:rb...@netflix.com>> 于2019年1月16日周三 上午9:27写道:
+1 for what Marcelo and Hyukjin said.

In particular, I agree that we can't expect Hive to release a version that is 
now more than 3 years old just to solve a problem for Spark. Maybe that would 
have been a reasonable ask instead of publishing a fork years ago, but I think 
this is now Spark's problem.

On Tue, Jan 15, 2019 at 9:02 PM Marcelo Vanzin 
mailto:van...@cloudera.com>> wrote:
+1 to that. HIVE-16391 by itself means we're giving up things like
Hadoop 3, and we're also putting the burden on the Hive folks to fix a
problem that we created.

The current PR is basically a Spark-side fix for that bug. It does
mean also upgrading Hive (which gives us Hadoop 3, yay!), but I think
it's really the right path to take here.

On Tue, Jan 15, 2019 at 6:32 PM Hyukjin Kwon 
mailto:gurwls...@gmail.com>> wrote:
>
> Resolving HIVE-16391 means Hive to release 1.2.x that contains the fixes of 
> our Hive fork (correct me if I am mistaken).
>
> Just to be honest by myself and as a personal opinion, that basically says 
> Hive to take care of Spark's dependency.
> Hive looks going ahead for 3.1.x and no one would use the newer release of 
> 1.2.x. In practice, Spark doesn't make a release 1.6.x anymore for instance,
>
> Frankly, my impression was that it's, honestly, our mistake to fix. Since 
> Spark community is big enough, I was thinking we should try to fix it by 
> ourselves first.
> I am not saying upgrading is the only way to get through this but I think we 
> should at least try first, and see what's next.
>
> It does, yes, sound more risky to upgrade it in our side but I think it's 
> worth to check and try it and see if it's possible.
> I think this is a standard approach to upgrade the dependency than using the 
> fork or letting Hive side to release another 1.2.x.
>
> If we fail to upgrade it for critical or inevitable reasons somehow, yes, we 
> could find an alternative but that basically means
> we're going to stay in 1.2.x for, at least, a long time (say .. until Spark 
> 4.0.0?).
>
> I know somehow it happened to be sensitive but to be just literally honest to 
> myself, I think we should make a try.
>


--
Marcelo


--
Ryan Blue
Software Engineer
Netflix


Re: incubator wiki - Please provide write access to incubator PMC report wiki for Pinot

2019-02-01 Thread Felix Cheung
Thanks Justin!

On Thu, Jan 31, 2019 at 8:23 PM Justin Mclean  wrote:

> Hi,
>
> > Could someone please add user *sbeeram*
>
> Done.
>
> Thanks,
> Justin
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


incubator wiki - Please provide write access to incubator PMC report wiki for Pinot

2019-01-31 Thread Felix Cheung
Could someone please add user *sbeeram*



On Thu, Jan 31, 2019 at 8:06 AM Felix Cheung  wrote:

> (A little strange as I didn’t see this thread before, so I’m trying to
> send it)
>
>
> On Thu, Jan 31, 2019 at 8:02 AM Sunitha Beeram 
> wrote:
>
>> +Felix
>>
>>
>>
>> *From: *Sunitha Beeram 
>> *Date: *Wednesday, January 30, 2019 at 2:37 PM
>> *To: *"general@incubator.apache.org" 
>> *Cc: *"priv...@pinot.apache.org" 
>> *Subject: *Re: Please provide write access to incubator PMC report wiki
>> for Pinot
>>
>>
>>
>> Hi,
>>
>>
>>
>> Sending this request again to see if the access can be granted.
>>
>>
>>
>> ~Sunitha
>> --
>>
>> *From:* Sunitha Beeram
>> *Sent:* Tuesday, January 29, 2019 9:56:02 AM
>> *To:* general@incubator.apache.org
>> *Cc:* priv...@pinot.apache.org
>> *Subject:* Re: Please provide write access to incubator PMC report wiki
>> for Pinot
>>
>>
>>
>> [Adding the right alias this time.]
>>
>>
>>
>> Could you please provide write access to user *sbeeram *so I can update
>> the podling report?
>>
>>
>>
>> Thanks,
>>
>> Sunitha
>>
>>
>> --
>>
>> *From:* Sunitha Beeram
>> *Sent:* Tuesday, January 29, 2019 9:52:58 AM
>> *To:* gene...@pinot.apache.org
>> *Cc:* priv...@pinot.apache.org
>> *Subject:* Please provide write access to incubator PMC report wiki for
>> Pinot
>>
>>
>>
>> Could you please provide write access to user *sbeeram *so I can update
>> the podling report?
>>
>>
>>
>> Thanks,
>>
>> Sunitha
>>
>


Re: [IP CLEARANCE] Apache Arrow Rust DataFusion

2019-01-31 Thread Felix Cheung
+1
(Checked IP clearance)


On Thu, Jan 31, 2019 at 12:57 PM Wes McKinney  wrote:

> Apache Arrow is receiving a donation of "DataFusion" a Rust library
> that computes analytical queries on Arrow columnar memory. It is
> intended to work together with the Arrow Rust library [1].
>
> Please vote to approve this contribution.
>
> This is a lazy consensus majority vote, per the IP clearance process
> [2], open for at least 72 hours.
>
> Wes
>
> [1]: http://incubator.apache.org/ip-clearance/arrow-rust-datafusion.html
> [2] http://incubator.apache.org/ip-clearance/
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


Re: Please provide write access to incubator PMC report wiki for Pinot

2019-01-31 Thread Felix Cheung
(A little strange as I didn’t see this thread before, so I’m trying to send
it)


On Thu, Jan 31, 2019 at 8:02 AM Sunitha Beeram  wrote:

> +Felix
>
>
>
> *From: *Sunitha Beeram 
> *Date: *Wednesday, January 30, 2019 at 2:37 PM
> *To: *"general@incubator.apache.org" 
> *Cc: *"priv...@pinot.apache.org" 
> *Subject: *Re: Please provide write access to incubator PMC report wiki
> for Pinot
>
>
>
> Hi,
>
>
>
> Sending this request again to see if the access can be granted.
>
>
>
> ~Sunitha
> --
>
> *From:* Sunitha Beeram
> *Sent:* Tuesday, January 29, 2019 9:56:02 AM
> *To:* general@incubator.apache.org
> *Cc:* priv...@pinot.apache.org
> *Subject:* Re: Please provide write access to incubator PMC report wiki
> for Pinot
>
>
>
> [Adding the right alias this time.]
>
>
>
> Could you please provide write access to user *sbeeram *so I can update
> the podling report?
>
>
>
> Thanks,
>
> Sunitha
>
>
> --
>
> *From:* Sunitha Beeram
> *Sent:* Tuesday, January 29, 2019 9:52:58 AM
> *To:* gene...@pinot.apache.org
> *Cc:* priv...@pinot.apache.org
> *Subject:* Please provide write access to incubator PMC report wiki for
> Pinot
>
>
>
> Could you please provide write access to user *sbeeram *so I can update
> the podling report?
>
>
>
> Thanks,
>
> Sunitha
>


Re: Unapproved releases

2019-01-30 Thread Felix Cheung
Thanks for the prompt action!


On Wed, Jan 30, 2019 at 2:03 PM Sunitha Beeram  wrote:

> The releases/tags have been cleanedup.
> https://github.com/apache/incubator-pinot/releases is now empty.
>
> Thanks
> Sunitha
> --
> *From:* Sunitha Beeram 
> *Sent:* Wednesday, January 30, 2019 9:55:50 AM
>
> *To:* dev@pinot.apache.org
> *Cc:* kisho...@apache.org; Felix Cheung; Subbu Subramaniam
> *Subject:* Re: Unapproved releases
>
> We will be deleting all old releases/tags using the "git push --delete
> origin ". Will do this around noon for the tags that show up in the
> output of "git tag".
>
>
> Will update this thread once done.
>
> 
> From: Justin Mclean 
> Sent: Tuesday, January 29, 2019 5:20:36 PM
> To: dev@pinot.apache.org
> Cc: kisho...@apache.org; Felix Cheung; Subbu Subramaniam
> Subject: Re: Unapproved releases
>
> Hi,
>
> > Would it be possible to get an exception only for the latest one until
> we make our first release (expected end of this quarter)?
>
> IMO you would need to ask VP legal for permission. My guess (but I could
> be wrong) is that they would say no and just suggest that you just make an
> official release.
>
> I do know of a couple of projects that have been allowed to make
> unofficial releases shortly after entering incubation, but they asked
> before making the release and it was clearly marked so to not cause any
> confusion with official releases.
>
> ASF release policy is quite clear:
> "Projects MUST direct outsiders towards official releases rather than raw
> source repositories, nightly builds, snapshots, release candidates, or any
> other similar packages."
>
> Thanks,
> Justin
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org
> For additional commands, e-mail: dev-h...@pinot.apache.org
>
>


Re: Unapproved releases

2019-01-29 Thread Felix Cheung
IMO the last one is after incubation which is the biggest problem here.
Other releases were from 2017.

Is there a way this user can make do with commit tag, and build from source?

Also, I’d also encourage discussions or requests with this user to be on
dev@ - as the community can benefit from that conversation and it
performance work as well. And perhaps we could get to an acceptable
alternative there.


On Tue, Jan 29, 2019 at 5:00 PM Subbu Subramaniam 
wrote:

> Hi Felix,
>
> Thanks for quick response.
>
> The one from Dec was made for a specific user, so we need to check with
> them (at least) before removing. I will start that process right away. It
> was made for them to do some performance measurements on Pinot. The
> previous one was too old, and did not have most new features we added.
>
> We can certainly remove the older versions from that page. Would it be
> possible to get an exception only for the latest one until we make our
> first release (expected end of this quarter)?
>
> Yes, we do plan to go to branch-based releases, much easier to manage
> without stopping checkin to the trunk.
>
> -Subbu
> --
> *From:* Felix Cheung 
> *Sent:* Tuesday, January 29, 2019 4:52 PM
> *To:* Subbu Subramaniam
> *Cc:* Justin Mclean; dev@pinot.apache.org; kisho...@apache.org
> *Subject:* Re: Unapproved releases
>
> Hi Subbu
>
> How are these releases made and what are they useful for?
>
> Generally, it’s not good to have unofficial unvoted releases, and having
> these visible links on GitHub can be confusing. To me the biggest concern
> is github doesn’t give you a way to note these are unofficial and
> unsupported releases. But even if there is a place to write note it can
> still be confusing as this is the same place other non-ASF projects have
> releases.
>
> Can I suggest to remove these for now?
>
> I think you can mitigate this with tag or commit id for now. Hopefully the
> community can build from source from this point in time references.
>
> Also it might be a good move to version branching structure. This is
> commonly done in other projects and it gives an easy access to point in
> time reference, and in the future release and maintanence.
>
> (I see there is also one release from Dec 2018 - that’s after entering
> incubation AFAIK)
>
>
>
> On Tue, Jan 29, 2019 at 4:30 PM Subbu Subramaniam <
> ssubraman...@linkedin.com> wrote:
>
> Hi Felix/Kishore,
>
> Justin has pointed out that we (Pinot) have snapshot releases that are not
> through the apache web site, and in his opinion these need to be removed.
> He has suggested that we discuss with our mentors and put out an
> appropriate note in this month's podling report.
>
> IIRC these releases were present before we incubated into apache.  We
> would like to keep them there until we get the first release out via apache.
> We are working hard to get the first release out this quarter.
>
> Do you suggest that we take them out, or let them be until the end of the
> quarter. We don't know if there are users that are using these images.
>
> thanks,
>
> -Subbu
> --
> *From:* Justin Mclean 
> *Sent:* Tuesday, January 29, 2019 3:09 PM
> *To:* Subbu Subramaniam
> *Cc:* dev@pinot.apache.org
> *Subject:* Re: Unapproved releases
>
> Hi,
>
> Is it sufficient if we write that the snapshots in the links are
> unofficial snapshots of Pinot, and that we are working on a formal apache
> release?
>
>
> No you are no allowed to make unofficial releases that are public in this
> way. IMO they need to be removed.
>
> It would be best to discuss this with your mentors and see what they have
> to say.
>
> Thanks,
> Justin
>
>


Re: Unapproved releases

2019-01-29 Thread Felix Cheung
Hi Subbu

How are these releases made and what are they useful for?

Generally, it’s not good to have unofficial unvoted releases, and having
these visible links on GitHub can be confusing. To me the biggest concern
is github doesn’t give you a way to note these are unofficial and
unsupported releases. But even if there is a place to write note it can
still be confusing as this is the same place other non-ASF projects have
releases.

Can I suggest to remove these for now?

I think you can mitigate this with tag or commit id for now. Hopefully the
community can build from source from this point in time references.

Also it might be a good move to version branching structure. This is
commonly done in other projects and it gives an easy access to point in
time reference, and in the future release and maintanence.

(I see there is also one release from Dec 2018 - that’s after entering
incubation AFAIK)



On Tue, Jan 29, 2019 at 4:30 PM Subbu Subramaniam 
wrote:

> Hi Felix/Kishore,
>
> Justin has pointed out that we (Pinot) have snapshot releases that are not
> through the apache web site, and in his opinion these need to be removed.
> He has suggested that we discuss with our mentors and put out an
> appropriate note in this month's podling report.
>
> IIRC these releases were present before we incubated into apache.  We
> would like to keep them there until we get the first release out via apache.
> We are working hard to get the first release out this quarter.
>
> Do you suggest that we take them out, or let them be until the end of the
> quarter. We don't know if there are users that are using these images.
>
> thanks,
>
> -Subbu
> --
> *From:* Justin Mclean 
> *Sent:* Tuesday, January 29, 2019 3:09 PM
> *To:* Subbu Subramaniam
> *Cc:* dev@pinot.apache.org
> *Subject:* Re: Unapproved releases
>
> Hi,
>
> Is it sufficient if we write that the snapshots in the links are
> unofficial snapshots of Pinot, and that we are working on a formal apache
> release?
>
>
> No you are no allowed to make unofficial releases that are public in this
> way. IMO they need to be removed.
>
> It would be best to discuss this with your mentors and see what they have
> to say.
>
> Thanks,
> Justin
>


Re: Missing SparkR in CRAN

2019-01-24 Thread Felix Cheung
Yes it was discussed on dev@. We are waiting for 2.3.3 to release to
resubmit.


On Thu, Jan 24, 2019 at 5:33 AM Hyukjin Kwon  wrote:

> Hi all,
>
> I happened to find SparkR is missing in CRAN. See
> https://cran.r-project.org/web/packages/SparkR/index.html
>
> I remember I saw some threads about this in spark-dev mailing list a long
> long ago IIRC. Is it in progress to fix it somewhere? or is it something I
> misunderstood?
>


Re: Requesting a review for updating license & notice information

2019-01-23 Thread Felix Cheung
Actually I’m going to retract the whole thing I said.

Seems like it should carry it under license for:
https://github.com/apache/spark/commit/bf4199e261c3c8dd2970e2a154c97b46fb339f02


From: Felix Cheung 
Sent: Wednesday, January 23, 2019 9:20 PM
To: dev@pinot.apache.org; dev@pinot.apache.org
Subject: Re: Requesting a review for updating license & notice information

Hi - I believe it’s ok not to include standard passive license too. I think I 
could some pointer on that in the PR where we discussed.



From: kishore g 
Sent: Wednesday, January 23, 2019 9:18 PM
To: dev@pinot.apache.org
Subject: Re: Requesting a review for updating license & notice information

I think that should work. Olivier, do you see any issues with that?

On Wed, Jan 23, 2019 at 5:51 PM Seunghyun Lee  wrote:

> Hi Felix and Kishore,
>
> Can you guys confirm that if I can remove copied license files for MIT, BSD
> based on http://www.apache.org/dev/licensing-howto.html#permissive-deps ?
>
> My understanding from the above document is that for MIT, BSD dependencies,
> it is sufficient to add pointers in LICENSE file.
>
> However, I'm a bit confused given that the example actually points the full
> license
>
> This product bundles SuperWidget 1.2.3, which is available under a
> "3-clause BSD" license. For details, see *deps/superwidget/.*
>
>
> As long as we don't violate the license issue, I can go ahead of removing
> copied license files in "/licenses" for MIT, BSD related libaries.
>
> Best,
> Seunghyun
>
> On Fri, Jan 18, 2019 at 9:53 PM Felix Cheung 
> wrote:
>
> > Thanks for doing that and sending this note.
> >
> > I will help check today.
> >
> >
> >
> > 
> > From: Seunghyun Lee 
> > Sent: Friday, January 18, 2019 3:42 PM
> > To: dev@pinot.apache.org
> > Subject: Requesting a review for updating license & notice information
> >
> > Hi Mentors,
> >
> > I have recently tracked down all dependencies that we bundle for our
> > distribution and came up with license, notice files for Pinot project. I
> > followed Apache Spark's approach since it provides the detailed
> > documentation on how to build license, notice files.
> >
> > Can someone look into the following pull request so that we can have some
> > initial feedbacks before we start to work on the release process?
> > https://github.com/apache/incubator-pinot/pull/3722
> >
> > Best,
> > Seunghyun
> >
>


[jira] [Commented] (SPARK-24615) Accelerator-aware task scheduling for Spark

2019-01-23 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750751#comment-16750751
 ] 

Felix Cheung commented on SPARK-24615:
--

We are interested to know as well.

 

[~mengxr] touched on this maybe Oct/Nov 2018, but I haven't heard anything else 
since.

> Accelerator-aware task scheduling for Spark
> ---
>
> Key: SPARK-24615
> URL: https://issues.apache.org/jira/browse/SPARK-24615
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Saisai Shao
>Priority: Major
>  Labels: Hydrogen, SPIP
>
> In the machine learning area, accelerator card (GPU, FPGA, TPU) is 
> predominant compared to CPUs. To make the current Spark architecture to work 
> with accelerator cards, Spark itself should understand the existence of 
> accelerators and know how to schedule task onto the executors where 
> accelerators are equipped.
> Current Spark’s scheduler schedules tasks based on the locality of the data 
> plus the available of CPUs. This will introduce some problems when scheduling 
> tasks with accelerators required.
>  # CPU cores are usually more than accelerators on one node, using CPU cores 
> to schedule accelerator required tasks will introduce the mismatch.
>  # In one cluster, we always assume that CPU is equipped in each node, but 
> this is not true of accelerator cards.
>  # The existence of heterogeneous tasks (accelerator required or not) 
> requires scheduler to schedule tasks with a smart way.
> So here propose to improve the current scheduler to support heterogeneous 
> tasks (accelerator requires or not). This can be part of the work of Project 
> hydrogen.
> Details is attached in google doc. It doesn't cover all the implementation 
> details, just highlight the parts should be changed.
>  
> CC [~yanboliang] [~merlintang]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: Requesting a review for updating license & notice information

2019-01-23 Thread Felix Cheung
Hi - I believe it’s ok not to include standard passive license too. I think I 
could some pointer on that in the PR where we discussed.



From: kishore g 
Sent: Wednesday, January 23, 2019 9:18 PM
To: dev@pinot.apache.org
Subject: Re: Requesting a review for updating license & notice information

I think that should work. Olivier, do you see any issues with that?

On Wed, Jan 23, 2019 at 5:51 PM Seunghyun Lee  wrote:

> Hi Felix and Kishore,
>
> Can you guys confirm that if I can remove copied license files for MIT, BSD
> based on http://www.apache.org/dev/licensing-howto.html#permissive-deps ?
>
> My understanding from the above document is that for MIT, BSD dependencies,
> it is sufficient to add pointers in LICENSE file.
>
> However, I'm a bit confused given that the example actually points the full
> license
>
> This product bundles SuperWidget 1.2.3, which is available under a
> "3-clause BSD" license. For details, see *deps/superwidget/.*
>
>
> As long as we don't violate the license issue, I can go ahead of removing
> copied license files in "/licenses" for MIT, BSD related libaries.
>
> Best,
> Seunghyun
>
> On Fri, Jan 18, 2019 at 9:53 PM Felix Cheung 
> wrote:
>
> > Thanks for doing that and sending this note.
> >
> > I will help check today.
> >
> >
> >
> > 
> > From: Seunghyun Lee 
> > Sent: Friday, January 18, 2019 3:42 PM
> > To: dev@pinot.apache.org
> > Subject: Requesting a review for updating license & notice information
> >
> > Hi Mentors,
> >
> > I have recently tracked down all dependencies that we bundle for our
> > distribution and came up with license, notice files for Pinot project. I
> > followed Apache Spark's approach since it provides the detailed
> > documentation on how to build license, notice files.
> >
> > Can someone look into the following pull request so that we can have some
> > initial feedbacks before we start to work on the release process?
> > https://github.com/apache/incubator-pinot/pull/3722
> >
> > Best,
> > Seunghyun
> >
>


Re: I have trained a ML model, now what?

2019-01-23 Thread Felix Cheung
Please comment in the JIRA/SPIP if you are interested! We can see the community 
support for a proposal like this.



From: Pola Yao 
Sent: Wednesday, January 23, 2019 8:01 AM
To: Riccardo Ferrari
Cc: Felix Cheung; User
Subject: Re: I have trained a ML model, now what?

Hi Riccardo,

Right now, Spark does not support low-latency predictions in Production. MLeap 
is an alternative and it's been used in many scenarios. But it's good to see 
that Spark Community has decided to provide such support.

On Wed, Jan 23, 2019 at 7:53 AM Riccardo Ferrari 
mailto:ferra...@gmail.com>> wrote:
Felix, thank you very much for the link. Much appreciated.

The attached PDF is very interesting, I found myself evaluating many of the 
scenarios described in Q3. It's unfortunate the proposal is not being worked 
on, would be great to see that part of the code base.

It is cool to see big players like Uber trying to make Open Source better, 
thanks!


On Tue, Jan 22, 2019 at 5:24 PM Felix Cheung 
mailto:felixcheun...@hotmail.com>> wrote:
About deployment/serving

SPIP
https://issues.apache.org/jira/browse/SPARK-26247



From: Riccardo Ferrari mailto:ferra...@gmail.com>>
Sent: Tuesday, January 22, 2019 8:07 AM
To: User
Subject: I have trained a ML model, now what?

Hi list!

I am writing here to here about your experience on putting Spark ML models into 
production at scale.

I know it is a very broad topic with many different faces depending on the 
use-case, requirements, user base and whatever is involved in the task. Still 
I'd like to open a thread about this topic that is as important as properly 
training a model and I feel is often neglected.

The task is serving web users with predictions and the main challenge I see is 
making it agile and swift.

I think there are mainly 3 general categories of such deployment that can be 
described as:

  *   Offline/Batch: Load a model, performs the inference, store the results in 
some datasotre (DB, indexes,...)
  *   Spark in the loop: Having a long running Spark context exposed in some 
way, this include streaming as well as some custom application that wraps the 
context.
  *   Use a different technology to load the Spark MLlib model and run the 
inference pipeline. I have read about MLeap and other PMML based solutions.

I would love to hear about opensource solutions and possibly without requiring 
cloud provider specific framework/component.

Again I am aware each of the previous category have benefits and drawback, so 
what would you pick? Why? and how?

Thanks!


[jira] [Commented] (SPARK-26679) Deconflict spark.executor.pyspark.memory and spark.python.worker.memory

2019-01-23 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750696#comment-16750696
 ] 

Felix Cheung commented on SPARK-26679:
--

I find the fraction configs very confusing and there are much misinformation 
about it and wrong hardcoded config. anyway. ...

> Deconflict spark.executor.pyspark.memory and spark.python.worker.memory
> ---
>
> Key: SPARK-26679
> URL: https://issues.apache.org/jira/browse/SPARK-26679
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 2.4.0
>Reporter: Ryan Blue
>Priority: Major
>
> In 2.4.0, spark.executor.pyspark.memory was added to limit the total memory 
> space of a python worker. There is another RDD setting, 
> spark.python.worker.memory that controls when Spark decides to spill data to 
> disk. These are currently similar, but not related to one another.
> PySpark should probably use spark.executor.pyspark.memory to limit or default 
> the setting of spark.python.worker.memory because the latter property 
> controls spilling and should be lower than the total memory limit. Renaming 
> spark.python.worker.memory would also help clarity because it sounds like it 
> should control the limit, but is more like the JVM setting 
> spark.memory.fraction.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: [VOTE] Release Apache Zeppelin 0.8.1 (RC1)

2019-01-22 Thread Felix Cheung
Thanks Jeff. I just tried to reach the rc1 tarball and realized its gone ;)



From: Jeff Zhang 
Sent: Tuesday, January 22, 2019 6:17 PM
To: users; dev
Subject: Re: [VOTE] Release Apache Zeppelin 0.8.1 (RC1)

The vote is passed with 10 +1 ( 4 binding +1), Thanks everyone. I will
continue the release process. Will announce it soon.


Jongyoul Lee  于2019年1月23日周三 上午6:01写道:

> +1
>
> Thanks Jeff.
>
> On Tue, Jan 22, 2019 at 8:05 PM moon soo Lee  wrote:
>
>> +1
>>
>> Thanks Jeff
>>
>> On Tue, Jan 22, 2019 at 01:07 Terry Wang  wrote:
>>
>>> +1, thanks for your effort
>>>
>>> > 在 2019年1月20日,上午1:33,antonkul...@gmail.com 写道:
>>> >
>>> > +1
>>> > cause, I desperately need Spark 2.4.0 support
>>> >
>>> > On 2019/01/17 05:40:17, Jeff Zhang  wrote:
>>> >> I will start with my +1
>>> >>
>>> >> Jeff Zhang  于2019年1月17日周四 上午11:28写道:
>>> >>
>>> >>> Hi folks,
>>> >>>
>>> >>> I propose the following RC to be released for the Apache Zeppelin
>>> >>> 0.8.1 release.
>>> >>>
>>> >>>
>>> >>> The commit id is c46f55f1d81df944fd1b69a7ccb68d0647294543 :
>>> >>>
>>> >>>
>>> https://gitbox.apache.org/repos/asf?p=zeppelin.git;a=commit;h=c46f55f1d81df944fd1b69a7ccb68d0647294543
>>> >>>
>>> >>>
>>> >>> This corresponds to the tag: v0.8.1-rc1 : *
>>> https://gitbox.apache.org/repos/asf?p=zeppelin.git;a=shortlog;h=refs/tags/v0.8.1-rc1
>>> <
>>> https://gitbox.apache.org/repos/asf?p=zeppelin.git;a=shortlog;h=refs/tags/v0.8.1-rc1
>>> >*
>>> >>>
>>> >>>
>>> >>> The release archives (tgz), signature, and checksums are here *
>>> https://dist.apache.org/repos/dist/dev/zeppelin/zeppelin-0.8.1-rc1/ <
>>> https://dist.apache.org/repos/dist/dev/zeppelin/zeppelin-0.8.1-rc1/>*
>>> >>>
>>> >>> The release candidate consists of the following source distribution
>>> archive
>>> >>> zeppelin-0.8.1.tgz
>>> >>>
>>> >>> In addition, the following supplementary binary distributions are
>>> provided
>>> >>> for user convenience at the same location
>>> >>> zeppelin-0.8.1-bin-all.tgz
>>> >>>
>>> >>>
>>> >>> The maven artifacts are here
>>> https://repository.apache.org/content/repositories/orgapachezeppelin-1269/org/apache/zeppelin/
>>> >>> You can find the KEYS file here:
>>> >>>
>>> >>> https://dist.apache.org/repos/dist/release/zeppelin/KEYS
>>> >>>
>>> >>> Release notes available at
>>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12343240==12316221
>>> >>> Vote will be open for next 72 hours (close at 7pm 19/Jan PDT).
>>> >>>
>>> >>> [ ] +1 approve
>>> >>> [ ] 0 no opinion
>>> >>> [ ] -1 disapprove (and reason why)
>>> >>>
>>> >>>
>>> >>>
>>> >>
>>> >> --
>>> >> Best Regards
>>> >>
>>> >> Jeff Zhang
>>> >>
>>>
>>>
>
> --
> 이종열, Jongyoul Lee, 李宗烈
> http://madeng.net
>


--
Best Regards

Jeff Zhang


Re: I have trained a ML model, now what?

2019-01-22 Thread Felix Cheung
About deployment/serving

SPIP
https://issues.apache.org/jira/browse/SPARK-26247



From: Riccardo Ferrari 
Sent: Tuesday, January 22, 2019 8:07 AM
To: User
Subject: I have trained a ML model, now what?

Hi list!

I am writing here to here about your experience on putting Spark ML models into 
production at scale.

I know it is a very broad topic with many different faces depending on the 
use-case, requirements, user base and whatever is involved in the task. Still 
I'd like to open a thread about this topic that is as important as properly 
training a model and I feel is often neglected.

The task is serving web users with predictions and the main challenge I see is 
making it agile and swift.

I think there are mainly 3 general categories of such deployment that can be 
described as:

  *   Offline/Batch: Load a model, performs the inference, store the results in 
some datasotre (DB, indexes,...)
  *   Spark in the loop: Having a long running Spark context exposed in some 
way, this include streaming as well as some custom application that wraps the 
context.
  *   Use a different technology to load the Spark MLlib model and run the 
inference pipeline. I have read about MLeap and other PMML based solutions.

I would love to hear about opensource solutions and possibly without requiring 
cloud provider specific framework/component.

Again I am aware each of the previous category have benefits and drawback, so 
what would you pick? Why? and how?

Thanks!


Re: Make proactive check for closure serializability optional?

2019-01-21 Thread Felix Cheung
Agreed on the pros / cons, esp driver could be the data science notebook.
Is it worthwhile making it configurable?



From: Sean Owen 
Sent: Monday, January 21, 2019 10:42 AM
To: Reynold Xin
Cc: dev
Subject: Re: Make proactive check for closure serializability optional?

None except the bug / PR I linked to, which is really just a bug in
the RowMatrix implementation; a 2GB closure isn't reasonable.
I doubt it's much overhead in the common case, because closures are
small and this extra check happens once per execution of the closure.

I can also imagine middle-ground cases where people are dragging along
largeish 10MB closures (like, a model or some data) and this could add
non-trivial memory pressure on the driver. They should be broadcasting
those things, sure.

Given just that I'd leave it alone, but was wondering if anyone had
ever had the same thought or more arguments that it should be
disable-able. In 'production' one would imagine all the closures do
serialize correctly and so this is just a bit overhead that could be
skipped.

On Mon, Jan 21, 2019 at 12:17 PM Reynold Xin  wrote:
>
> Did you actually observe a perf issue?
>
> On Mon, Jan 21, 2019 at 10:04 AM Sean Owen  wrote:
>>
>> The ClosureCleaner proactively checks that closures passed to
>> transformations like RDD.map() are serializable, before they're
>> executed. It does this by just serializing it with the JavaSerializer.
>>
>> That's a nice feature, although there's overhead in always trying to
>> serialize the closure ahead of time, especially if the closure is
>> large. It shouldn't be large, usually. But I noticed it when coming up
>> with this fix: https://github.com/apache/spark/pull/23600
>>
>> It made me wonder, should this be optional, or even not the default?
>> Closures that don't serialize still fail, just later when an action is
>> invoked. I don't feel strongly about it, just checking if anyone had
>> pondered this before.
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Release Apache Spark 2.3.3 (RC1)

2019-01-20 Thread Felix Cheung
+1

My focus is on R (sorry couldn’t cross validate what’s Sean is seeing)

tested:
reviewed doc
R package test
win-builder, r-hub
Tarball/package signature




From: Takeshi Yamamuro 
Sent: Thursday, January 17, 2019 6:49 PM
To: Spark dev list
Subject: [VOTE] Release Apache Spark 2.3.3 (RC1)

Please vote on releasing the following candidate as Apache Spark version 2.3.3.

The vote is open until January 20 8:00PM (PST) and passes if a majority +1 PMC 
votes are cast, with
a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 2.3.3
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.3.3-rc1 (commit 
b5ea9330e3072e99841270b10dc1d2248127064b):
https://github.com/apache/spark/tree/v2.3.3-rc1

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.3.3-rc1-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1297

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.3.3-rc1-docs/

The list of bug fixes going into 2.3.3 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12343759

FAQ

=
How can I help test this release?
=

If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with a out of date RC going forward).

===
What should happen to JIRA tickets still targeting 2.3.3?
===

The current list of open tickets targeted at 2.3.3 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target Version/s" 
= 2.3.3

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==
But my bug isn't fixed?
==

In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.

--
---
Takeshi Yamamuro


Re: [DISCUSS] Identifiers with multi-catalog support

2019-01-20 Thread Felix Cheung
+1 I like Ryan last mail. Thank you for putting it clearly (should be a 
spec/SPIP!)

I agree and understand the need for 3 part id. However I don’t think we should 
make assumption that it must be or can only be as long as 3 parts. Once the 
catalog is identified (ie. The first part), the catalog should be responsible 
for resolving the namespace or schema etc. Agree also path is good idea to add 
to support file-based variant. Should separator be optional (perhaps in *space) 
to keep this extensible (it might not always be ‘.’)

Also this whole scheme will need to play nice with column identifier as well.



From: Ryan Blue 
Sent: Thursday, January 17, 2019 11:38 AM
To: Spark Dev List
Subject: Re: [DISCUSS] Identifiers with multi-catalog support

Any discussion on how Spark should manage identifiers when multiple catalogs 
are supported?

I know this is an area where a lot of people are interested in making progress, 
and it is a blocker for both multi-catalog support and CTAS in DSv2.

On Sun, Jan 13, 2019 at 2:22 PM Ryan Blue 
mailto:rb...@netflix.com>> wrote:

I think that the solution to this problem is to mix the two approaches by 
supporting 3 identifier parts: catalog, namespace, and name, where namespace 
can be an n-part identifier:

type Namespace = Seq[String]
case class CatalogIdentifier(space: Namespace, name: String)


This allows catalogs to work with the hierarchy of the external store, but the 
catalog API only requires a few discovery methods to list namespaces and to 
list each type of object in a namespace.

def listNamespaces(): Seq[Namespace]
def listNamespaces(space: Namespace, prefix: String): Seq[Namespace]
def listTables(space: Namespace): Seq[CatalogIdentifier]
def listViews(space: Namespace): Seq[CatalogIdentifier]
def listFunctions(space: Namespace): Seq[CatalogIdentifier]


The methods to list tables, views, or functions, would only return identifiers 
for the type queried, not namespaces or the other objects.

The SQL parser would be updated so that identifiers are parsed to 
UnresovledIdentifier(parts: Seq[String]), and resolution would work like this 
pseudo-code:

def resolveIdentifier(ident: UnresolvedIdentifier): (CatalogPlugin, 
CatalogIdentifier) = {
  val maybeCatalog = sparkSession.catalog(ident.parts.head)
  ident.parts match {
case Seq(catalogName, *space, name) if catalog.isDefined =>
  (maybeCatalog.get, CatalogIdentifier(space, name))
case Seq(*space, name) =>
  (sparkSession.defaultCatalog, CatalogIdentifier(space, name))
  }
}


I think this is a good approach because it allows Spark users to reference or 
discovery any name in the hierarchy of an external store, it uses a few 
well-defined methods for discovery, and makes name hierarchy a user concern.

  *   SHOW (DATABASES|SCHEMAS|NAMESPACES) would return the result of 
listNamespaces()
  *   SHOW NAMESPACES LIKE a.b% would return the result of 
listNamespaces(Seq("a"), "b")
  *   USE a.b would set the current namespace to Seq("a", "b")
  *   SHOW TABLES would return the result of listTables(currentNamespace)

Also, I think that we could generalize this a little more to support path-based 
tables by adding a path to CatalogIdentifier, either as a namespace or as a 
separate optional string. Then, the identifier passed to a catalog would work 
for either a path-based table or a catalog table, without needing a path-based 
catalog API.

Thoughts?

On Sun, Jan 13, 2019 at 1:38 PM Ryan Blue 
mailto:rb...@netflix.com>> wrote:

In the DSv2 sync up, we tried to discuss the Table metadata proposal but were 
side-tracked on its use of TableIdentifier. There were good points about how 
Spark should identify tables, views, functions, etc, and I want to start a 
discussion here.

Identifiers are orthogonal to the TableCatalog proposal that can be updated to 
use whatever identifier class we choose. That proposal is concerned with what 
information should be passed to define a table, and how to pass that 
information.

The main question for this discussion is: how should Spark identify tables, 
views, and functions when it supports multiple catalogs?

There are two main approaches:

  1.  Use a 3-part identifier, catalog.database.table
  2.  Use an identifier with an arbitrary number of parts

Option 1: use 3-part identifiers

The argument for option #1 is that it is simple. If an external data store has 
additional logical hierarchy layers, then that hierarchy would be mapped to 
multiple catalogs in Spark. Spark can support show tables and show databases 
without much trouble. This is the approach used by Presto, so there is some 
precedent for it.

The drawback is that mapping a more complex hierarchy into Spark requires more 
configuration. If an external DB has a 3-level hierarchy — say, for example, 
schema.database.table — then option #1 requires users to configure a catalog 
for each top-level structure, each schema. When a new schema is added, it is 
not 

[jira] [Assigned] (SPARK-26642) Add --num-executors option to spark-submit for Spark on K8S

2019-01-20 Thread Felix Cheung (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung reassigned SPARK-26642:


Assignee: Luca Canali

> Add --num-executors option to spark-submit for Spark on K8S
> ---
>
> Key: SPARK-26642
> URL: https://issues.apache.org/jira/browse/SPARK-26642
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Spark Core
>Affects Versions: 3.0.0
>Reporter: Luca Canali
>Assignee: Luca Canali
>Priority: Trivial
>
> Currently spark-submit supports the option --num-executors NUM only for Spark 
> on YARN. Users running Spark on K8S can specify the requested number of 
> executors in spark-submit with --conf spark.executor.instances=NUM
> This proposes to extend the spark-submit option --num-executors to be 
> applicable to Spark on K8S too. It is motivated by convenience, for example 
> when migrating jobs written for YARN to run on K8S.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-26642) Add --num-executors option to spark-submit for Spark on K8S

2019-01-20 Thread Felix Cheung (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung resolved SPARK-26642.
--
   Resolution: Fixed
Fix Version/s: 3.0.0

> Add --num-executors option to spark-submit for Spark on K8S
> ---
>
> Key: SPARK-26642
> URL: https://issues.apache.org/jira/browse/SPARK-26642
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Spark Core
>Affects Versions: 3.0.0
>Reporter: Luca Canali
>Assignee: Luca Canali
>Priority: Trivial
> Fix For: 3.0.0
>
>
> Currently spark-submit supports the option --num-executors NUM only for Spark 
> on YARN. Users running Spark on K8S can specify the requested number of 
> executors in spark-submit with --conf spark.executor.instances=NUM
> This proposes to extend the spark-submit option --num-executors to be 
> applicable to Spark on K8S too. It is motivated by convenience, for example 
> when migrating jobs written for YARN to run on K8S.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: Persist Dataframe to HDFS considering HDFS Block Size.

2019-01-19 Thread Felix Cheung
You can call coalesce to combine partitions..



From: Shivam Sharma <28shivamsha...@gmail.com>
Sent: Saturday, January 19, 2019 7:43 AM
To: user@spark.apache.org
Subject: Persist Dataframe to HDFS considering HDFS Block Size.

Hi All,

I wanted to persist dataframe on HDFS. Basically, I am inserting data into a 
HIVE table using Spark. Currently, at the time of writing to HIVE table I have 
set total shuffle partitions = 400 so total 400 files are being created which 
is not even considering HDFS block size. How can I tell spark to persist 
according to HDFS Blocks.

We have something like this HIVE which solves this problem:

set hive.merge.sparkfiles=true;
set hive.merge.smallfiles.avgsize=204800;
set hive.merge.size.per.task=409600;

Thanks

--
Shivam Sharma
Indian Institute Of Information Technology, Design and Manufacturing Jabalpur
Mobile No- (+91) 8882114744
Email:- 28shivamsha...@gmail.com
LinkedIn:-https://www.linkedin.com/in/28shivamsharma


Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

2019-01-19 Thread Felix Cheung
To clarify, yarn actually supports excluding node right when requesting 
resources. It’s spark that doesn’t provide a way to populate such a blacklist.

If you can change yarn config, the equivalent is node label: 
https://hadoop.apache.org/docs/r2.7.4/hadoop-yarn/hadoop-yarn-site/NodeLabel.html




From: Li Gao 
Sent: Saturday, January 19, 2019 8:43 AM
To: Felix Cheung
Cc: Serega Sheypak; user
Subject: Re: Spark on Yarn, is it possible to manually blacklist nodes before 
running spark job?

on yarn it is impossible afaik. on kubernetes you can use taints to keep 
certain nodes outside of spark

On Fri, Jan 18, 2019 at 9:35 PM Felix Cheung 
mailto:felixcheun...@hotmail.com>> wrote:
Not as far as I recall...



From: Serega Sheypak mailto:serega.shey...@gmail.com>>
Sent: Friday, January 18, 2019 3:21 PM
To: user
Subject: Spark on Yarn, is it possible to manually blacklist nodes before 
running spark job?

Hi, is there any possibility to tell Scheduler to blacklist specific nodes in 
advance?


Re: Requesting a review for updating license & notice information

2019-01-18 Thread Felix Cheung
Thanks for doing that and sending this note.

I will help check today.




From: Seunghyun Lee 
Sent: Friday, January 18, 2019 3:42 PM
To: dev@pinot.apache.org
Subject: Requesting a review for updating license & notice information

Hi Mentors,

I have recently tracked down all dependencies that we bundle for our
distribution and came up with license, notice files for Pinot project. I
followed Apache Spark's approach since it provides the detailed
documentation on how to build license, notice files.

Can someone look into the following pull request so that we can have some
initial feedbacks before we start to work on the release process?
https://github.com/apache/incubator-pinot/pull/3722

Best,
Seunghyun


Re: Original Marvin-AI Website Redirection

2019-01-18 Thread Felix Cheung
https://www.marvin-ai.org/

Does not load for me...



From: Daniel Takabayashi 
Sent: Friday, January 18, 2019 6:29 PM
To: dev@marvin.apache.org
Subject: Re: Original Marvin-AI Website Redirection

Problem solved, cold you check if is working?

thanks,
Taka

Em sex, 18 de jan de 2019 às 09:30, Daniel Takabayashi <
daniel.takabaya...@gmail.com> escreveu:

> Hi Guys,
>
> Actually the redirect configuration is already done but we are having
> this security problem. That must be any thing related with the apache
> domain. To test just visit the old domain and you will receive the follow
> error:
>
> NET::ERR_CERT_COMMON_NAME_INVALID
>
> Subject: *.openoffice.org
>
> Issuer: COMODO RSA Domain Validation Secure Server CA
>
> Expires on: Jul 18, 2020
>
> Current date: Jan 18, 2019
>
>
> Any ideia?
>
>
>
> Em qui, 17 de jan de 2019 às 18:44, Wei Chen 
> escreveu:
>
>> Hello All,
>>
>> I found that the original website is currently redirected to Apache
>> OpenOffice.
>> https://www.marvin-ai.org/
>> Hello Taka, can you help to check the website redirection?
>> It should go to
>> https://marvin.apache.org Best Regards Wei
>>
>


Re: Spark on Yarn, is it possible to manually blacklist nodes before running spark job?

2019-01-18 Thread Felix Cheung
Not as far as I recall...



From: Serega Sheypak 
Sent: Friday, January 18, 2019 3:21 PM
To: user
Subject: Spark on Yarn, is it possible to manually blacklist nodes before 
running spark job?

Hi, is there any possibility to tell Scheduler to blacklist specific nodes in 
advance?


Re: [Discuss] 0.8.1 Release

2019-01-15 Thread Felix Cheung
+1 thanks!



From: Jeff Zhang 
Sent: Tuesday, January 15, 2019 5:39 PM
To: users
Subject: Re: [Discuss] 0.8.1 Release

Hi Folks,

I will start the 0.8.1 release since there's no concerns on this.

Jeff Zhang mailto:zjf...@gmail.com>> 于2019年1月14日周一 下午8:50写道:
Hi Everyone,

Sorry for replying this mail so late, I am busy with other stuff. I plan to 
start the 0.8.1 release this week, does anyone have any concerns ?



Jeff Zhang mailto:zjf...@gmail.com>> 于2018年10月10日周三 上午8:31写道:
Community is working on that.

Paul Brenner mailto:pbren...@placeiq.com>>于2018年10月10日周三 
上午12:35写道:
I would second this if it doesn’t hold up the release too much. We would love 
to see this implemented.


[https://marketing.placeiq.net/images/placeiq.png]
   Paul Brenner  
[https://marketing.placeiq.net/images/twitter1.png] 
    
[https://marketing.placeiq.net/images/facebook.png] 
   
 
[https://marketing.placeiq.net/images/linkedin.png] 

SR. DATA SCIENTIST
(217) 390-3033


[PlaceIQ:CES
 
2018]

On Oct 9, 2018, 12:33 PM -0400, Павел Мяснов 
mailto:glowf...@gmail.com>>, wrote:
I know it is marked as improvement, not a bug; but is it possible to include 
ticket https://jira.apache.org/jira/browse/ZEPPELIN-3307 to this minor release? 
I hit this problem quite often and it would be really nice to see this problem 
solved soon.

Pavel Myasnov

On 2018/09/27 07:08:31, Jeff Zhang http://gmail.com>> wrote:
> Hi folks,>
>
> It has been a while for 0.8.0 release, and we got many feedback for that,>
> so I think it is time for us to make 0.8.1 release for fix the bugs of>
> 0.8.0. Here's the umbrella tickets for 0.8.1 release>
> https://jira.apache.org/jira/browse/ZEPPELIN-3629>
>
> If you find any ticket that is necessary for 0.8.1 but not under this>
> umbrella ticket, feel free to link that. I will start the 0.8.1 release at>
> the beginning of Oct.>
>


--
Best Regards

Jeff Zhang


--
Best Regards

Jeff Zhang


Re: [DISCUSS] Upgrade built-in Hive to 2.3.4

2019-01-15 Thread Felix Cheung
One common case we have is a custom input format.

In any case, even when Hive metatstore is protocol compatible we should still 
upgrade or replace the hive jar from a fork, as Sean says, from a ASF release 
process standpoint. Unless there is a plan for removing hive integration (all 
of it) from the spark core project..



From: Xiao Li 
Sent: Tuesday, January 15, 2019 10:03 AM
To: Felix Cheung
Cc: rb...@netflix.com; Yuming Wang; dev
Subject: Re: [DISCUSS] Upgrade built-in Hive to 2.3.4

Let me take my words back. To read/write a table, Spark users do not use the 
Hive execution JARs, unless they explicitly create the Hive serde tables. 
Actually, I want to understand the motivation and use cases why your usage 
scenarios need to create Hive serde tables instead of our Spark native tables?

BTW, we are still using Hive metastore as our metadata store. This does not 
require the Hive execution JAR upgrade, based on my understanding. Users can 
upgrade it to the newer version of Hive metastore.

Felix Cheung mailto:felixcheun...@hotmail.com>> 
于2019年1月15日周二 上午9:56写道:
And we are super 100% dependent on Hive...



From: Ryan Blue 
Sent: Tuesday, January 15, 2019 9:53 AM
To: Xiao Li
Cc: Yuming Wang; dev
Subject: Re: [DISCUSS] Upgrade built-in Hive to 2.3.4

How do we know that most Spark users are not using Hive? I wouldn't be 
surprised either way, but I do want to make sure we aren't making decisions 
based on any one person's (or one company's) experience about what "most" Spark 
users do.

On Tue, Jan 15, 2019 at 9:44 AM Xiao Li 
mailto:gatorsm...@gmail.com>> wrote:
Hi, Yuming,

Thank you for your contributions! The community aims at reducing the dependence 
on Hive. Currently, most of Spark users are not using Hive. The changes looks 
risky to me.

To support Hadoop 3.x, we just need to resolve this JIRA: 
https://issues.apache.org/jira/browse/HIVE-16391

Cheers,

Xiao

Yuming Wang mailto:wgy...@gmail.com>> 于2019年1月15日周二 上午8:41写道:
Dear Spark Developers and Users,

Hyukjin and I plan to upgrade the built-in Hive 
from1.2.1-spark2<https://github.com/JoshRosen/hive/tree/release-1.2.1-spark2> 
to2.3.4<https://github.com/apache/hive/releases/tag/rel%2Frelease-2.3.4> to 
solve some critical issues, such as support Hadoop 3.x, solve some ORC and 
Parquet issues. This is the list:
Hive issues:
[SPARK-26332<https://issues.apache.org/jira/browse/SPARK-26332>][HIVE-10790] 
Spark sql write orc table on viewFS throws exception
[SPARK-25193<https://issues.apache.org/jira/browse/SPARK-25193>][HIVE-12505] 
insert overwrite doesn't throw exception when drop old data fails
[SPARK-26437<https://issues.apache.org/jira/browse/SPARK-26437>][HIVE-13083] 
Decimal data becomes bigint to query, unable to query
[SPARK-25919<https://issues.apache.org/jira/browse/SPARK-25919>][HIVE-11771] 
Date value corrupts when tables are "ParquetHiveSerDe" formatted and target 
table is Partitioned
[SPARK-12014<https://issues.apache.org/jira/browse/SPARK-12014>][HIVE-11100] 
Spark SQL query containing semicolon is broken in Beeline

Spark issues:
[SPARK-23534<https://issues.apache.org/jira/browse/SPARK-23534>] Spark run on 
Hadoop 3.0.0
[SPARK-20202<https://issues.apache.org/jira/browse/SPARK-20202>] Remove 
references to org.spark-project.hive
[SPARK-18673<https://issues.apache.org/jira/browse/SPARK-18673>] Dataframes 
doesn't work on Hadoop 3.x; Hive rejects Hadoop version
[SPARK-24766<https://issues.apache.org/jira/browse/SPARK-24766>] 
CreateHiveTableAsSelect and InsertIntoHiveDir won't generate decimal column 
stats in parquet


Since the code for the hive-thriftserver module has changed too much for this 
upgrade, I split it into two PRs for easy review.
The first PR<https://github.com/apache/spark/pull/23552> does not contain the 
changes of hive-thriftserver. Please ignore the failed test in 
hive-thriftserver.
The second PR<https://github.com/apache/spark/pull/23553> is complete changes.

I have created a Spark distribution for Apache Hadoop 2.7, you might download 
it viaGoogle 
Drive<https://drive.google.com/open?id=1cq2I8hUTs9F4JkFyvRfdOJ5BlxV0ujgt> 
orBaidu Pan<https://pan.baidu.com/s/1b090Ctuyf1CDYS7c0puBqQ>.
Please help review and test. Thanks.


--
Ryan Blue
Software Engineer
Netflix


Re: [DISCUSS] Upgrade built-in Hive to 2.3.4

2019-01-15 Thread Felix Cheung
And we are super 100% dependent on Hive...



From: Ryan Blue 
Sent: Tuesday, January 15, 2019 9:53 AM
To: Xiao Li
Cc: Yuming Wang; dev
Subject: Re: [DISCUSS] Upgrade built-in Hive to 2.3.4

How do we know that most Spark users are not using Hive? I wouldn't be 
surprised either way, but I do want to make sure we aren't making decisions 
based on any one person's (or one company's) experience about what "most" Spark 
users do.

On Tue, Jan 15, 2019 at 9:44 AM Xiao Li 
mailto:gatorsm...@gmail.com>> wrote:
Hi, Yuming,

Thank you for your contributions! The community aims at reducing the dependence 
on Hive. Currently, most of Spark users are not using Hive. The changes looks 
risky to me.

To support Hadoop 3.x, we just need to resolve this JIRA: 
https://issues.apache.org/jira/browse/HIVE-16391

Cheers,

Xiao

Yuming Wang mailto:wgy...@gmail.com>> 于2019年1月15日周二 上午8:41写道:
Dear Spark Developers and Users,

Hyukjin and I plan to upgrade the built-in Hive 
from1.2.1-spark2 
to 2.3.4 to 
solve some critical issues, such as support Hadoop 3.x, solve some ORC and 
Parquet issues. This is the list:
Hive issues:
[SPARK-26332][HIVE-10790] 
Spark sql write orc table on viewFS throws exception
[SPARK-25193][HIVE-12505] 
insert overwrite doesn't throw exception when drop old data fails
[SPARK-26437][HIVE-13083] 
Decimal data becomes bigint to query, unable to query
[SPARK-25919][HIVE-11771] 
Date value corrupts when tables are "ParquetHiveSerDe" formatted and target 
table is Partitioned
[SPARK-12014][HIVE-11100] 
Spark SQL query containing semicolon is broken in Beeline

Spark issues:
[SPARK-23534] Spark run on 
Hadoop 3.0.0
[SPARK-20202] Remove 
references to org.spark-project.hive
[SPARK-18673] Dataframes 
doesn't work on Hadoop 3.x; Hive rejects Hadoop version
[SPARK-24766] 
CreateHiveTableAsSelect and InsertIntoHiveDir won't generate decimal column 
stats in parquet


Since the code for the hive-thriftserver module has changed too much for this 
upgrade, I split it into two PRs for easy review.
The first PR does not contain the 
changes of hive-thriftserver. Please ignore the failed test in 
hive-thriftserver.
The second PR is complete changes.

I have created a Spark distribution for Apache Hadoop 2.7, you might download 
it viaGoogle 
Drive or 
Baidu Pan.
Please help review and test. Thanks.


--
Ryan Blue
Software Engineer
Netflix


Re: [DISCUSS] Upgrade built-in Hive to 2.3.4

2019-01-15 Thread Felix Cheung
Resolving https://issues.apache.org/jira/browse/HIVE-16391 means to keep Spark 
on Hive 1.2?

I’m not sure that is reducing dependency on Hive - Hive is still there and it’s 
a very old Hive. IMO it is increasing the risk the longer we keep on this. (And 
it’s been years)

Looking at the two PR. They don’t seem very drastic to me, except for thrift 
server. Is there another, better approach to thrift server?



From: Xiao Li 
Sent: Tuesday, January 15, 2019 9:44 AM
To: Yuming Wang
Cc: dev
Subject: Re: [DISCUSS] Upgrade built-in Hive to 2.3.4

Hi, Yuming,

Thank you for your contributions! The community aims at reducing the dependence 
on Hive. Currently, most of Spark users are not using Hive. The changes looks 
risky to me.

To support Hadoop 3.x, we just need to resolve this JIRA: 
https://issues.apache.org/jira/browse/HIVE-16391

Cheers,

Xiao

Yuming Wang mailto:wgy...@gmail.com>> 于2019年1月15日周二 上午8:41写道:
Dear Spark Developers and Users,

Hyukjin and I plan to upgrade the built-in Hive from 
1.2.1-spark2 to 
2.3.4 to solve 
some critical issues, such as support Hadoop 3.x, solve some ORC and Parquet 
issues. This is the list:
Hive issues:
[SPARK-26332][HIVE-10790] 
Spark sql write orc table on viewFS throws exception
[SPARK-25193][HIVE-12505] 
insert overwrite doesn't throw exception when drop old data fails
[SPARK-26437][HIVE-13083] 
Decimal data becomes bigint to query, unable to query
[SPARK-25919][HIVE-11771] 
Date value corrupts when tables are "ParquetHiveSerDe" formatted and target 
table is Partitioned
[SPARK-12014][HIVE-11100] 
Spark SQL query containing semicolon is broken in Beeline

Spark issues:
[SPARK-23534] Spark run on 
Hadoop 3.0.0
[SPARK-20202] Remove 
references to org.spark-project.hive
[SPARK-18673] Dataframes 
doesn't work on Hadoop 3.x; Hive rejects Hadoop version
[SPARK-24766] 
CreateHiveTableAsSelect and InsertIntoHiveDir won't generate decimal column 
stats in parquet


Since the code for the hive-thriftserver module has changed too much for this 
upgrade, I split it into two PRs for easy review.
The first PR does not contain the 
changes of hive-thriftserver. Please ignore the failed test in 
hive-thriftserver.
The second PR is complete changes.

I have created a Spark distribution for Apache Hadoop 2.7, you might download 
it via Google 
Drive or 
Baidu Pan.
Please help review and test. Thanks.


[jira] [Commented] (SPARK-26565) modify dev/create-release/release-build.sh to let jenkins build packages w/o publishing

2019-01-14 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742845#comment-16742845
 ] 

Felix Cheung commented on SPARK-26565:
--

ok AFAIK

the first `cannot stat` is from this line

 
{code:java}
# Move R source package to match the Spark release version if the versions are 
not the same.
# NOTE(shivaram): `mv` throws an error on Linux if source and destination are 
same file
if [ "$R_PACKAGE_VERSION" != "$VERSION" ]; then
mv "$SPARK_HOME/R/SparkR_$R_PACKAGE_VERSION.tar.gz" 
"$SPARK_HOME/R/SparkR_$VERSION.tar.gz"
fi
{code}
 

which should rename the tarball to SparkR_3.0.0-SNAPSHOT.tar.gz

(maybe it wasn't built?)

the second is from
{code:java}
# Remove the python distribution from dist/ if we built it
if [ "$MAKE_PIP" == "true" ]; then
rm -f "$DISTDIR"/python/dist/pyspark-*.tar.gz
fi

{code}
which is fine, it isn't the output we are looking for  (not under dist)

 

 

> modify dev/create-release/release-build.sh to let jenkins build packages w/o 
> publishing
> ---
>
> Key: SPARK-26565
> URL: https://issues.apache.org/jira/browse/SPARK-26565
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.2.3, 2.3.3, 2.4.1, 3.0.0
>Reporter: shane knapp
>Assignee: shane knapp
>Priority: Major
> Attachments: fine.png, no-idea.jpg
>
>
> about a year+ ago, we stopped publishing releases directly from jenkins...
> this means that the spark-\{branch}-packaging builds are failing due to gpg 
> signing failures, and i would like to update these builds to *just* perform 
> packaging.
> example:
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-package/2183/console]
> i propose to change dev/create-release/release-build.sh...
> when the script is called w/the 'package' option, add an {{if}} statement to 
> skip the following sections when run on jenkins:
> 1) gpg signing of the source tarball (lines 184-187)
> 2) gpg signing of the sparkR dist (lines 243-248)
> 3) gpg signing of the python dist (lines 256-261)
> 4) gpg signing of the regular binary dist (lines 264-271)
> 5) the svn push of the signed dists (lines 317-332)
>  
> -another, and probably much better option, is to nuke the 
> spark-\{branch}-packaging builds and create new ones that just build things 
> w/o touching this incredible fragile shell scripting nightmare.-



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: [VOTE] Accept Hudi into the Apache Incubator

2019-01-14 Thread Felix Cheung
+1


On Mon, Jan 14, 2019 at 3:20 AM Suneel Marthi
 wrote:

> +1
>
> Sent from my iPhone
>
> > On Jan 13, 2019, at 5:34 PM, Thomas Weise  wrote:
> >
> > Hi all,
> >
> > Following the discussion of the Hudi proposal in [1], this is a vote
> > on accepting Hudi into the Apache Incubator,
> > per the ASF policy [2] and voting rules [3].
> >
> > A vote for accepting a new Apache Incubator podling is a
> > majority vote. Everyone is welcome to vote, only
> > Incubator PMC member votes are binding.
> >
> > This vote will run for at least 72 hours. Please VOTE as
> > follows:
> >
> > [ ] +1 Accept Hudi into the Apache Incubator
> > [ ] +0 Abstain
> > [ ] -1 Do not accept Hudi into the Apache Incubator because ...
> >
> > The proposal is included below, but you can also access it on
> > the wiki [4].
> >
> > Thanks for reviewing and voting,
> > Thomas
> >
> > [1]
> >
> https://lists.apache.org/thread.html/12e2bdaa095d68dae6f8731e473d3d43885783177d1b7e3ff2f65b6d@%3Cgeneral.incubator.apache.org%3E
> >
> > [2]
> >
> https://incubator.apache.org/policy/incubation.html#approval_of_proposal_by_sponsor
> >
> > [3] http://www.apache.org/foundation/voting.html
> >
> > [4] https://wiki.apache.org/incubator/HudiProposal
> >
> >
> >
> > = Hudi Proposal =
> >
> > == Abstract ==
> >
> > Hudi is a big-data storage library, that provides atomic upserts and
> > incremental data streams.
> >
> > Hudi manages data stored in Apache Hadoop and other API compatible
> > distributed file systems/cloud stores.
> >
> > == Proposal ==
> >
> > Hudi provides the ability to atomically upsert datasets with new values
> in
> > near-real time, making data available quickly to existing query engines
> > like Apache Hive, Apache Spark, & Presto. Additionally, Hudi provides a
> > sequence of changes to a dataset from a given point-in-time to enable
> > incremental data pipelines that yield greater efficiency & latency than
> > their typical batch counterparts. By carefully managing number of files &
> > sizes, Hudi greatly aids both query engines (e.g: always providing
> > well-sized files) and underlying storage (e.g: HDFS NameNode memory
> > consumption).
> >
> > Hudi is largely implemented as an Apache Spark library that reads/writes
> > data from/to Hadoop compatible filesystem. SQL queries on Hudi datasets
> are
> > supported via specialized Apache Hadoop input formats, that understand
> > Hudi’s storage layout. Currently, Hudi manages datasets using a
> combination
> > of Apache Parquet & Apache Avro file/serialization formats.
> >
> > == Background ==
> >
> > Apache Hadoop distributed filesystem (HDFS) & other compatible cloud
> > storage systems (e.g: Amazon S3, Google Cloud, Microsoft Azure) serve as
> > longer term analytical storage for thousands of organizations. Typical
> > analytical datasets are built by reading data from a source (e.g:
> upstream
> > databases, messaging buses, or other datasets), transforming the data,
> > writing results back to storage, & making it available for analytical
> > queries--all of this typically accomplished in batch jobs which operate
> in
> > a bulk fashion on partitions of datasets. Such a style of processing
> > typically incurs large delays in making data available to queries as well
> > as lot of complexity in carefully partitioning datasets to guarantee
> > latency SLAs.
> >
> > The need for fresher/faster analytics has increased enormously in the
> past
> > few years, as evidenced by the popularity of Stream processing systems
> like
> > Apache Spark, Apache Flink, and messaging systems like Apache Kafka. By
> > using updateable state store to incrementally compute & instantly reflect
> > new results to queries and using a “tailable” messaging bus to publish
> > these results to other downstream jobs, such systems employ a different
> > approach to building analytical dataset. Even though this approach yields
> > low latency, the amount of data managed in such real-time data-marts is
> > typically limited in comparison to the aforementioned longer term storage
> > options. As a result, the overall data architecture has become more
> complex
> > with more moving parts and specialized systems, leading to duplication of
> > data and a strain on usability.
> >
> > Hudi takes a hybrid approach. Instead of moving vast amounts of batch
> data
> > to streaming systems, we simply add the streaming primitives (upserts &
> > incremental consumption) onto existing batch processing technologies. We
> > believe that by adding some missing blocks to an existing Hadoop stack,
> we
> > are able to a provide similar capabilities right on top of Hadoop at a
> > reduced cost and with an increased efficiency, greatly simplifying the
> > overall architecture in the process.
> >
> > Hudi was originally developed at Uber (original name “Hoodie”) to address
> > such broad inefficiencies in ingest & ETL & ML pipelines across Uber’s
> data
> > ecosystem that required the upsert & incremental consumption primitives
> > 

Re: Clean out https://dist.apache.org/repos/dist/dev/spark/ ?

2019-01-13 Thread Felix Cheung
Eh, yeah, like the one with signing, I think doc build is mostly useful when a) 
right before we do a release or during the RC resets; b) someone makes a huge 
change to doc and want to check

Not sure we need this nightly?



From: Sean Owen 
Sent: Sunday, January 13, 2019 5:45 AM
To: Felix Cheung
Cc: Dongjoon Hyun; dev
Subject: Re: Clean out https://dist.apache.org/repos/dist/dev/spark/ ?

Will do. Er, maybe add Shane here too -- should we disable this docs
job? are these docs used, and is there much value in nightly snapshots
of the whole site?

On Sat, Jan 12, 2019 at 9:04 PM Felix Cheung  wrote:
>
> These get “published” by doc nightly build from riselab Jenkins...
>
>
> 
> From: Dongjoon Hyun 
> Sent: Saturday, January 12, 2019 4:32 PM
> To: Sean Owen
> Cc: dev
> Subject: Re: Clean out https://dist.apache.org/repos/dist/dev/spark/ ?
>
> +1 for removing old docs there.
> It seems that we need to upgrade our build script to maintain only one 
> published snapshot doc.
>
> Bests,
> Dongjoon.
>
> On Sat, Jan 12, 2019 at 2:18 PM Sean Owen  wrote:
>>
>> I'm not sure it matters a whole lot, but we are encouraged to keep
>> dist.apache.org free of old files. I see tons of old -docs snapshot
>> builds at https://dist.apache.org/repos/dist/dev/spark/ -- can I just
>> remove anything not so current?
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>


[jira] [Updated] (SPARK-26120) Fix a streaming query leak in Structured Streaming R tests

2019-01-12 Thread Felix Cheung (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-26120:
-
Fix Version/s: 2.3.3

> Fix a streaming query leak in Structured Streaming R tests
> --
>
> Key: SPARK-26120
> URL: https://issues.apache.org/jira/browse/SPARK-26120
> Project: Spark
>  Issue Type: Test
>  Components: SparkR, Structured Streaming, Tests
>Affects Versions: 2.4.0
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>Priority: Minor
> Fix For: 2.3.3, 2.4.1, 3.0.0
>
>
> "Specify a schema by using a DDL-formatted string when reading" doesn't stop 
> the streaming query before stopping Spark. It causes the following annoying 
> logs.
> {code}
> Exception in thread "stream execution thread for [id = 
> 186dad10-e87f-4155-8119-00e0e63bbc1a, runId = 
> 2c0cc158-410b-442f-ac36-20f80ec429b1]" Exception in thread "stream execution 
> thread for people3 [id = ffa6136d-fe7b-4777-aa47-b0cb64d07ea4, runId = 
> 644b888e-9cce-4a09-bb5e-2fb122796c19]" org.apache.spark.SparkException: 
> Exception thrown in awaitResult: 
>   at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:355)
>   at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
>   at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:92)
>   at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:76)
>   at 
> org.apache.spark.sql.execution.streaming.state.StateStoreCoordinatorRef.deactivateInstances(StateStoreCoordinator.scala:108)
>   at 
> org.apache.spark.sql.streaming.StreamingQueryManager.notifyQueryTermination(StreamingQueryManager.scala:399)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runStream$2.apply(StreamExecution.scala:342)
>   at 
> org.apache.spark.util.UninterruptibleThread.runUninterruptibly(UninterruptibleThread.scala:77)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:323)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:204)
> Caused by: org.apache.spark.rpc.RpcEnvStoppedException: RpcEnv already 
> stopped.
>   at 
> org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:158)
>   at 
> org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:135)
>   at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:229)
>   at 
> org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:523)
>   at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:91)
>   ... 7 more
> org.apache.spark.SparkException: Exception thrown in awaitResult: 
>   at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:355)
>   at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
>   at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:92)
>   at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:76)
>   at 
> org.apache.spark.sql.execution.streaming.state.StateStoreCoordinatorRef.deactivateInstances(StateStoreCoordinator.scala:108)
>   at 
> org.apache.spark.sql.streaming.StreamingQueryManager.notifyQueryTermination(StreamingQueryManager.scala:399)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runStream$2.apply(StreamExecution.scala:342)
>   at 
> org.apache.spark.util.UninterruptibleThread.runUninterruptibly(UninterruptibleThread.scala:77)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:323)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:204)
> Caused by: org.apache.spark.rpc.RpcEnvStoppedException: RpcEnv already 
> stopped.
>   at 
> org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:158)
>   at 
> org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:135)
>   at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:229)
>   at 
> org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:523)
>   at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:91)
>   ... 7 more
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26565) modify dev/create-release/release-build.sh to let jenkins build packages w/o publishing

2019-01-12 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16741453#comment-16741453
 ] 

Felix Cheung commented on SPARK-26565:
--

Yeah, my point wasn’t to allow access to unsigned release but to help RM to 
check out built packages before kicking off the RC process before release.

For example, often times the build completes successfully but there are some 
issue with the content.


> modify dev/create-release/release-build.sh to let jenkins build packages w/o 
> publishing
> ---
>
> Key: SPARK-26565
> URL: https://issues.apache.org/jira/browse/SPARK-26565
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.2.3, 2.3.3, 2.4.1, 3.0.0
>Reporter: shane knapp
>Assignee: shane knapp
>Priority: Major
> Attachments: fine.png, no-idea.jpg
>
>
> about a year+ ago, we stopped publishing releases directly from jenkins...
> this means that the spark-\{branch}-packaging builds are failing due to gpg 
> signing failures, and i would like to update these builds to *just* perform 
> packaging.
> example:
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-package/2183/console]
> i propose to change dev/create-release/release-build.sh...
> when the script is called w/the 'package' option, add an {{if}} statement to 
> skip the following sections when run on jenkins:
> 1) gpg signing of the source tarball (lines 184-187)
> 2) gpg signing of the sparkR dist (lines 243-248)
> 3) gpg signing of the python dist (lines 256-261)
> 4) gpg signing of the regular binary dist (lines 264-271)
> 5) the svn push of the signed dists (lines 317-332)
>  
> -another, and probably much better option, is to nuke the 
> spark-\{branch}-packaging builds and create new ones that just build things 
> w/o touching this incredible fragile shell scripting nightmare.-



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: Clean out https://dist.apache.org/repos/dist/dev/spark/ ?

2019-01-12 Thread Felix Cheung
These get “published” by doc nightly build from riselab Jenkins...



From: Dongjoon Hyun 
Sent: Saturday, January 12, 2019 4:32 PM
To: Sean Owen
Cc: dev
Subject: Re: Clean out https://dist.apache.org/repos/dist/dev/spark/ ?

+1 for removing old docs there.
It seems that we need to upgrade our build script to maintain only one 
published snapshot doc.

Bests,
Dongjoon.

On Sat, Jan 12, 2019 at 2:18 PM Sean Owen 
mailto:sro...@gmail.com>> wrote:
I'm not sure it matters a whole lot, but we are encouraged to keep
dist.apache.org free of old files. I see tons of old 
-docs snapshot
builds at https://dist.apache.org/repos/dist/dev/spark/ -- can I just
remove anything not so current?

-
To unsubscribe e-mail: 
dev-unsubscr...@spark.apache.org



[jira] [Updated] (SPARK-25572) SparkR tests failed on CRAN on Java 10

2019-01-12 Thread Felix Cheung (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-25572:
-
Fix Version/s: 2.3.3

> SparkR tests failed on CRAN on Java 10
> --
>
> Key: SPARK-25572
> URL: https://issues.apache.org/jira/browse/SPARK-25572
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.4.0
>Reporter: Felix Cheung
>    Assignee: Felix Cheung
>Priority: Major
> Fix For: 2.3.3, 2.4.0
>
>
> follow up to SPARK-24255
> from 2.3.2 release we can see that CRAN doesn't seem to respect the system 
> requirements as running tests - we have seen cases where SparkR is run on 
> Java 10, which unfortunately Spark does not start on. For 2.4.x, lets attempt 
> skipping all tests



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26010) SparkR vignette fails on CRAN on Java 11

2019-01-12 Thread Felix Cheung (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-26010:
-
Fix Version/s: 2.3.3

> SparkR vignette fails on CRAN on Java 11
> 
>
> Key: SPARK-26010
> URL: https://issues.apache.org/jira/browse/SPARK-26010
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Felix Cheung
>    Assignee: Felix Cheung
>Priority: Major
> Fix For: 2.3.3, 2.4.1, 3.0.0
>
>
> follow up to SPARK-25572
> but for vignettes
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: spark2.4 arrow enabled true,error log not returned

2019-01-12 Thread Felix Cheung
Do you mean you run the same code on yarn and standalone? Can you check if they 
are running the same python versions?



From: Bryan Cutler 
Sent: Thursday, January 10, 2019 5:29 PM
To: libinsong1...@gmail.com
Cc: zlist Spark
Subject: Re: spark2.4 arrow enabled true,error log not returned

Hi, could you please clarify if you are running a YARN cluster when you see 
this problem?  I tried on Spark standalone and could not reproduce.  If it's on 
a YARN cluster, please file a JIRA and I can try to investigate further.

Thanks,
Bryan

On Sat, Dec 15, 2018 at 3:42 AM 李斌松 
mailto:libinsong1...@gmail.com>> wrote:
spark2.4 arrow enabled true,error log not returned,in spark 2.3,There's no such 
problem.

1、spark.sql.execution.arrow.enabled=true
[image.png]
yarn log:

18/12/15 14:35:52 INFO CodeGenerator: Code generated in 1030.698785 ms
18/12/15 14:35:54 INFO PythonRunner: Times: total = 1985, boot = 1892, init = 
92, finish = 1
18/12/15 14:35:54 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1799 
bytes result sent to driver
18/12/15 14:35:55 INFO CoarseGrainedExecutorBackend: Got assigned task 1
18/12/15 14:35:55 INFO Executor: Running task 0.0 in stage 1.0 (TID 1)
18/12/15 14:35:55 INFO TorrentBroadcast: Started reading broadcast variable 1
18/12/15 14:35:55 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in 
memory (estimated size 8.3 KB, free 1048.8 MB)
18/12/15 14:35:55 INFO TorrentBroadcast: Reading broadcast variable 1 took 18 ms
18/12/15 14:35:55 INFO MemoryStore: Block broadcast_1 stored as values in 
memory (estimated size 14.0 KB, free 1048.8 MB)
18/12/15 14:35:55 INFO CodeGenerator: Code generated in 30.269745 ms
18/12/15 14:35:55 INFO PythonRunner: Times: total = 13, boot = 5, init = 7, 
finish = 1
18/12/15 14:35:55 INFO Executor: Finished task 0.0 in stage 1.0 (TID 1). 1893 
bytes result sent to driver
18/12/15 14:35:55 INFO CoarseGrainedExecutorBackend: Got assigned task 2
18/12/15 14:35:55 INFO Executor: Running task 1.0 in stage 1.0 (TID 2)
18/12/15 14:35:55 ERROR Executor: Exception in task 1.0 in stage 1.0 (TID 2)
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/usr/install/pyspark/2.4.0/pyspark.zip/pyspark/worker.py", line 377, in 
main
process()
  File "/usr/install/pyspark/2.4.0/pyspark.zip/pyspark/worker.py", line 372, in 
process
serializer.dump_stream(func(split_index, iterator), outfile)
  File "/usr/install/pyspark/2.4.0/pyspark.zip/pyspark/serializers.py", line 
390, in dump_stream
vs = list(itertools.islice(iterator, batch))
  File "/usr/install/pyspark/2.4.0/pyspark.zip/pyspark/util.py", line 99, in 
wrapper
return f(*args, **kwargs)
  File 
"/yarn/nm/usercache/admin/appcache/application_1544579748138_0215/container_e43_1544579748138_0215_01_01/python1.py",
 line 435, in mapfunc
ValueError: could not convert string to float: 'a'

at 
org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:452)
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:588)
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:571)
at 
org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:406)
at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at 
org.apache.spark.sql.execution.arrow.ArrowConverters$$anon$1.hasNext(ArrowConverters.scala:99)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at 
org.apache.spark.sql.execution.arrow.ArrowConverters$$anon$1.foreach(ArrowConverters.scala:97)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:104)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:48)
at 
scala.collection.TraversableOnce$class.to(TraversableOnce.scala:310)
at 
org.apache.spark.sql.execution.arrow.ArrowConverters$$anon$1.to(ArrowConverters.scala:97)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:302)
at 
org.apache.spark.sql.execution.arrow.ArrowConverters$$anon$1.toBuffer(ArrowConverters.scala:97)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:289)
at 
org.apache.spark.sql.execution.arrow.ArrowConverters$$anon$1.toArray(ArrowConverters.scala:97)
at 
org.apache.spark.sql.Dataset$$anonfun$collectAsArrowToPython$1$$anonfun$apply$17$$anonfun$apply$18.apply(Dataset.scala:3314)
at 
org.apache.spark.sql.Dataset$$anonfun$collectAsArrowToPython$1$$anonfun$apply$17$$anonfun$apply$18.apply(Dataset.scala:3314)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)

Re: [NOTICE] Mandatory migration of git repositories to gitbox.apache.org

2019-01-06 Thread Felix Cheung
Great, thanks!

Could someone take the lead to create the Infra JIRA? (All it needs is to 
reference this mail thread from lists.apache.org)

It should be very simple - once that is done there’s a small step for 
committers to connect their accounts on gitbox, that’s it.



From: Patrick Stuedi 
Sent: Sunday, January 6, 2019 11:47 AM
To: dev@crail.apache.org
Subject: Re: [NOTICE] Mandatory migration of git repositories to 
gitbox.apache.org

+1 from me too, we should create the infra JIRA as Luciano suggested.
Jonas has interacted with the infra team in the past.

On Sun, Jan 6, 2019 at 8:09 PM Felix Cheung  wrote:
>
> Hi there - any more vote/comment?
>
>
> 
> From: Luciano Resende 
> Sent: Friday, January 4, 2019 5:44 AM
> To: dev@crail.apache.org
> Subject: Re: [NOTICE] Mandatory migration of git repositories to 
> gitbox.apache.org
>
> Hey all, please use this thread if you have any concerns/questions
> about this move, otherwise please create a INFRA Jira issue and
> reference this thread.
>
> On Thu, Jan 3, 2019 at 8:19 AM Apache Infrastructure Team
>  wrote:
> >
> > Hello, crail folks.
> > As stated earlier in 2018, all git repositories must be migrated from
> > the git-wip-us.apache.org URL to gitbox.apache.org, as the old service
> > is being decommissioned. Your project is receiving this email because
> > you still have repositories on git-wip-us that needs to be migrated.
> >
> > The following repositories on git-wip-us belong to your project:
> > - incubator-crail.git
> > - incubator-crail-website.git
> >
> >
> > We are now entering the mandated (coordinated) move stage of the roadmap,
> > and you are asked to please coordinate migration with the Apache
> > Infrastructure Team before February 7th. All repositories not migrated
> > on February 7th will be mass migrated without warning, and we'd appreciate
> > it if we could work together to avoid a big mess that day :-).
> >
> > Moving to gitbox means you will get full write access on GitHub as well,
> > and be able to close/merge pull requests and much more.
> >
> > To have your repositories moved, please follow these steps:
> >
> > - Ensure consensus on the move (a link to a lists.apache.org thread will
> > suffice for us as evidence).
> > - Create a JIRA ticket at https://issues.apache.org/jira/browse/INFRA
> >
> > Your migration should only take a few minutes. If you wish to migrate
> > at a specific time of day or date, please do let us know in the ticket.
> >
> > As always, we appreciate your understanding and patience as we move
> > things around and work to provide better services and features for
> > the Apache Family.
> >
> > Should you wish to contact us with feedback or questions, please do so
> > at: us...@infra.apache.org.
> >
> >
> > With regards,
> > Apache Infrastructure
> >
>
>
> --
> Luciano Resende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/


Re: Spark Packaging Jenkins

2019-01-06 Thread Felix Cheung
Awesome Shane!



From: shane knapp 
Sent: Sunday, January 6, 2019 11:38 AM
To: Felix Cheung
Cc: Dongjoon Hyun; Wenchen Fan; dev
Subject: Re: Spark Packaging Jenkins

noted.  i like the idea of building (but not signing) the release and will 
update the job(s) this week.

On Sun, Jan 6, 2019 at 11:22 AM Felix Cheung 
mailto:felixcheun...@hotmail.com>> wrote:
https://spark.apache.org/release-process.html

Look for do-release-docker.sh script



From: Felix Cheung mailto:felixcheun...@hotmail.com>>
Sent: Sunday, January 6, 2019 11:17 AM
To: Dongjoon Hyun; Wenchen Fan
Cc: dev; shane knapp
Subject: Re: Spark Packaging Jenkins

The release process doc should have been updated on this - as mentioned we do 
not use Jenkins for release signing (take this offline if further discussion is 
needed)

The release build on Jenkins can still be useful for pre-validating the release 
build process (without actually signing it)



From: Dongjoon Hyun mailto:dongjoon.h...@gmail.com>>
Sent: Saturday, January 5, 2019 9:46 PM
To: Wenchen Fan
Cc: dev; shane knapp
Subject: Re: Spark Packaging Jenkins

Thank you, Wenchen.

I see. I'll update the doc and proceed to the next step manually as you advise. 
And it seems that we can stop the outdated Jenkins jobs, too.

Bests,
Dongjoon.

On Sat, Jan 5, 2019 at 20:15 Wenchen Fan 
mailto:cloud0...@gmail.com>> wrote:
IIRC there was a change to the release process: we stop using the shared gpg 
key on Jenkins, but use the personal key of the release manager. I'm not sure 
Jenkins can help testing package anymore.

BTW release manager needs to run the packaging script by himself. If there is a 
problem, the release manager will find it out sooner or later.



On Sun, Jan 6, 2019 at 6:34 AM Dongjoon Hyun 
mailto:dongjoon.h...@gmail.com>> wrote:
Hi, All.

It turns out that `gpg signing` is the next huddle in Spark Packaging Jenkins.
Since 2.4.0 release, is there something changed in our Jenkins machine?

  gpg: skipped 
"/home/jenkins/workspace/spark-master-package/spark-utils/new-release-scripts/jenkins/jenkins-credentials-JEtz0nyn/gpg.tmp":
 No secret key
  gpg: signing failed: No secret key

Bests,
Dongjoon.


On Fri, Jan 4, 2019 at 11:52 AM shane knapp 
mailto:skn...@berkeley.edu>> wrote:
https://issues.apache.org/jira/browse/SPARK-26537

On Fri, Jan 4, 2019 at 11:31 AM shane knapp 
mailto:skn...@berkeley.edu>> wrote:
this may push in to early next week...  these builds were set up before my 
time, and i'm currently unraveling how they all work before pushing a commit to 
fix stuff.

nothing like some code archaeology to make my friday more exciting!  :)

shane

On Fri, Jan 4, 2019 at 11:08 AM Dongjoon Hyun 
mailto:dongjoon.h...@gmail.com>> wrote:
Thank you, Shane!

Bests,
Dongjoon.

On Fri, Jan 4, 2019 at 10:50 AM shane knapp 
mailto:skn...@berkeley.edu>> wrote:
yeah, i'll get on that today.  thanks for the heads up.

On Fri, Jan 4, 2019 at 10:46 AM Dongjoon Hyun 
mailto:dongjoon.h...@gmail.com>> wrote:
Hi, All

As a part of release process, we need to check Packaging/Compile/Test Jenkins 
status.

http://spark.apache.org/release-process.html

1. Spark Packaging: 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/
2. Spark QA Compile: 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/
3. Spark QA Test: https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/

Currently, (2) and (3) are working because it uses GitHub 
(https://github.com/apache/spark.git).
But, (1) seems to be broken because it's looking for old 
repo(https://git-wip-us.apache.org/repos/asf/spark.git/info/refs) instead of 
new GitBox.

Can we fix this in this week?

Bests,
Dongjoon.



--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: Spark Packaging Jenkins

2019-01-06 Thread Felix Cheung
https://spark.apache.org/release-process.html

Look for do-release-docker.sh script



From: Felix Cheung 
Sent: Sunday, January 6, 2019 11:17 AM
To: Dongjoon Hyun; Wenchen Fan
Cc: dev; shane knapp
Subject: Re: Spark Packaging Jenkins

The release process doc should have been updated on this - as mentioned we do 
not use Jenkins for release signing (take this offline if further discussion is 
needed)

The release build on Jenkins can still be useful for pre-validating the release 
build process (without actually signing it)



From: Dongjoon Hyun 
Sent: Saturday, January 5, 2019 9:46 PM
To: Wenchen Fan
Cc: dev; shane knapp
Subject: Re: Spark Packaging Jenkins

Thank you, Wenchen.

I see. I'll update the doc and proceed to the next step manually as you advise. 
And it seems that we can stop the outdated Jenkins jobs, too.

Bests,
Dongjoon.

On Sat, Jan 5, 2019 at 20:15 Wenchen Fan 
mailto:cloud0...@gmail.com>> wrote:
IIRC there was a change to the release process: we stop using the shared gpg 
key on Jenkins, but use the personal key of the release manager. I'm not sure 
Jenkins can help testing package anymore.

BTW release manager needs to run the packaging script by himself. If there is a 
problem, the release manager will find it out sooner or later.



On Sun, Jan 6, 2019 at 6:34 AM Dongjoon Hyun 
mailto:dongjoon.h...@gmail.com>> wrote:
Hi, All.

It turns out that `gpg signing` is the next huddle in Spark Packaging Jenkins.
Since 2.4.0 release, is there something changed in our Jenkins machine?

  gpg: skipped 
"/home/jenkins/workspace/spark-master-package/spark-utils/new-release-scripts/jenkins/jenkins-credentials-JEtz0nyn/gpg.tmp":
 No secret key
  gpg: signing failed: No secret key

Bests,
Dongjoon.


On Fri, Jan 4, 2019 at 11:52 AM shane knapp 
mailto:skn...@berkeley.edu>> wrote:
https://issues.apache.org/jira/browse/SPARK-26537

On Fri, Jan 4, 2019 at 11:31 AM shane knapp 
mailto:skn...@berkeley.edu>> wrote:
this may push in to early next week...  these builds were set up before my 
time, and i'm currently unraveling how they all work before pushing a commit to 
fix stuff.

nothing like some code archaeology to make my friday more exciting!  :)

shane

On Fri, Jan 4, 2019 at 11:08 AM Dongjoon Hyun 
mailto:dongjoon.h...@gmail.com>> wrote:
Thank you, Shane!

Bests,
Dongjoon.

On Fri, Jan 4, 2019 at 10:50 AM shane knapp 
mailto:skn...@berkeley.edu>> wrote:
yeah, i'll get on that today.  thanks for the heads up.

On Fri, Jan 4, 2019 at 10:46 AM Dongjoon Hyun 
mailto:dongjoon.h...@gmail.com>> wrote:
Hi, All

As a part of release process, we need to check Packaging/Compile/Test Jenkins 
status.

http://spark.apache.org/release-process.html

1. Spark Packaging: 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/
2. Spark QA Compile: 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/
3. Spark QA Test: https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/

Currently, (2) and (3) are working because it uses GitHub 
(https://github.com/apache/spark.git).
But, (1) seems to be broken because it's looking for old 
repo(https://git-wip-us.apache.org/repos/asf/spark.git/info/refs) instead of 
new GitBox.

Can we fix this in this week?

Bests,
Dongjoon.



--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: Spark Packaging Jenkins

2019-01-06 Thread Felix Cheung
The release process doc should have been updated on this - as mentioned we do 
not use Jenkins for release signing (take this offline if further discussion is 
needed)

The release build on Jenkins can still be useful for pre-validating the release 
build process (without actually signing it)



From: Dongjoon Hyun 
Sent: Saturday, January 5, 2019 9:46 PM
To: Wenchen Fan
Cc: dev; shane knapp
Subject: Re: Spark Packaging Jenkins

Thank you, Wenchen.

I see. I'll update the doc and proceed to the next step manually as you advise. 
And it seems that we can stop the outdated Jenkins jobs, too.

Bests,
Dongjoon.

On Sat, Jan 5, 2019 at 20:15 Wenchen Fan 
mailto:cloud0...@gmail.com>> wrote:
IIRC there was a change to the release process: we stop using the shared gpg 
key on Jenkins, but use the personal key of the release manager. I'm not sure 
Jenkins can help testing package anymore.

BTW release manager needs to run the packaging script by himself. If there is a 
problem, the release manager will find it out sooner or later.



On Sun, Jan 6, 2019 at 6:34 AM Dongjoon Hyun 
mailto:dongjoon.h...@gmail.com>> wrote:
Hi, All.

It turns out that `gpg signing` is the next huddle in Spark Packaging Jenkins.
Since 2.4.0 release, is there something changed in our Jenkins machine?

  gpg: skipped 
"/home/jenkins/workspace/spark-master-package/spark-utils/new-release-scripts/jenkins/jenkins-credentials-JEtz0nyn/gpg.tmp":
 No secret key
  gpg: signing failed: No secret key

Bests,
Dongjoon.


On Fri, Jan 4, 2019 at 11:52 AM shane knapp 
mailto:skn...@berkeley.edu>> wrote:
https://issues.apache.org/jira/browse/SPARK-26537

On Fri, Jan 4, 2019 at 11:31 AM shane knapp 
mailto:skn...@berkeley.edu>> wrote:
this may push in to early next week...  these builds were set up before my 
time, and i'm currently unraveling how they all work before pushing a commit to 
fix stuff.

nothing like some code archaeology to make my friday more exciting!  :)

shane

On Fri, Jan 4, 2019 at 11:08 AM Dongjoon Hyun 
mailto:dongjoon.h...@gmail.com>> wrote:
Thank you, Shane!

Bests,
Dongjoon.

On Fri, Jan 4, 2019 at 10:50 AM shane knapp 
mailto:skn...@berkeley.edu>> wrote:
yeah, i'll get on that today.  thanks for the heads up.

On Fri, Jan 4, 2019 at 10:46 AM Dongjoon Hyun 
mailto:dongjoon.h...@gmail.com>> wrote:
Hi, All

As a part of release process, we need to check Packaging/Compile/Test Jenkins 
status.

http://spark.apache.org/release-process.html

1. Spark Packaging: 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/
2. Spark QA Compile: 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/
3. Spark QA Test: https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test/

Currently, (2) and (3) are working because it uses GitHub 
(https://github.com/apache/spark.git).
But, (1) seems to be broken because it's looking for old 
repo(https://git-wip-us.apache.org/repos/asf/spark.git/info/refs) instead of 
new GitBox.

Can we fix this in this week?

Bests,
Dongjoon.



--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: Apache Spark 2.2.3 ?

2019-01-02 Thread Felix Cheung
+1 on 2.2.3 of course



From: Dongjoon Hyun 
Sent: Wednesday, January 2, 2019 12:21 PM
To: Saisai Shao
Cc: Xiao Li; Felix Cheung; Sean Owen; dev
Subject: Re: Apache Spark 2.2.3 ?

Thank you for swift feedbacks and Happy New Year. :)
For 2.2.3 release on next week, I see two positive opinions (including mine)
and don't see any direct objections.

Apache Spark has a mature, resourceful, and fast-growing community.
One of the important characteristic of the mature community is
the expectable behavior where the users are able to depend on.
For instance, we have a nice tradition to cut the branch as a sign of feature 
freeze.
The *final* release of a branch is not only good for the end users, but also a 
good sign of the EOL of the branch for all.

As a junior committer of the community, I want to contribute to deliver the 
final 2.2.3 release to the community and to finalize `branch-2.2`.

* For Apache Spark JIRA, I checked that there is no on-going issues targeting 
on `2.2.3`.
* For commits, I reviewed the newly landed commits after `2.2.2` tag and 
updated a few missing JIRA issues accordingly.
* Apparently, we can release 2.2.3 next week.

BTW, I'm +1 for the next 2.3/2.4 and have been expecting those releases before 
Spark+AI Summit (April) because we did like that usually.
Please send another email to `dev` mailing list because it's worth to receive 
more attentions and requests.

Bests,
Dongjoon.


On Tue, Jan 1, 2019 at 9:35 PM Saisai Shao 
mailto:sai.sai.s...@gmail.com>> wrote:
Agreed to have a new branch-2.3 release, as we already accumulated several 
fixes.

Thanks
Saisai

Xiao Li mailto:lix...@databricks.com>> 于2019年1月2日周三 
下午1:32写道:
Based on the commit history, 
https://gitbox.apache.org/repos/asf?p=spark.git;a=shortlog;h=refs/heads/branch-2.3
 contains more critical fixes. Maybe the priority is higher?

On Tue, Jan 1, 2019 at 9:22 PM Felix Cheung 
mailto:felixcheun...@hotmail.com>> wrote:
Speaking of, it’s been 3 months since 2.3.2... (Sept 2018)

And 2 months since 2.4.0 (Nov 2018) - does the community feel 2.4 branch is 
stabilizing?



From: Sean Owen mailto:sro...@gmail.com>>
Sent: Tuesday, January 1, 2019 8:30 PM
To: Dongjoon Hyun
Cc: dev
Subject: Re: Apache Spark 2.2.3 ?

I agree with that logic, and if you're volunteering to do the legwork,
I don't see a reason not to cut a final 2.2 release.

On Tue, Jan 1, 2019 at 9:19 PM Dongjoon Hyun 
mailto:dongjoon.h...@gmail.com>> wrote:
>
> Hi, All.
>
> Apache Spark community has a policy maintaining the feature branch for 18 
> months. I think it's time for the 2.2.3 release since 2.2.0 is released on 
> July 2017.
>
> http://spark.apache.org/versioning-policy.html
>
> After 2.2.2 (July 2018), `branch-2.2` has 40 patches (including security 
> patches).
>
> https://gitbox.apache.org/repos/asf?p=spark.git;a=shortlog;h=refs/heads/branch-2.2
>
> If it's okay and there is no further plan on `branch-2.2`, I want to 
> volunteer to prepare the first RC (early next week?).
>
> Please let me know your opinions about this.
>
> Bests,
> Dongjoon.

-
To unsubscribe e-mail: 
dev-unsubscr...@spark.apache.org<mailto:dev-unsubscr...@spark.apache.org>



--
[https://databricks.com/sparkaisummit/north-america?utm_source=email_medium=signature]


Re: Apache Spark 2.2.3 ?

2019-01-01 Thread Felix Cheung
Speaking of, it’s been 3 months since 2.3.2... (Sept 2018)

And 2 months since 2.4.0 (Nov 2018) - does the community feel 2.4 branch is 
stabilizing?



From: Sean Owen 
Sent: Tuesday, January 1, 2019 8:30 PM
To: Dongjoon Hyun
Cc: dev
Subject: Re: Apache Spark 2.2.3 ?

I agree with that logic, and if you're volunteering to do the legwork,
I don't see a reason not to cut a final 2.2 release.

On Tue, Jan 1, 2019 at 9:19 PM Dongjoon Hyun  wrote:
>
> Hi, All.
>
> Apache Spark community has a policy maintaining the feature branch for 18 
> months. I think it's time for the 2.2.3 release since 2.2.0 is released on 
> July 2017.
>
> http://spark.apache.org/versioning-policy.html
>
> After 2.2.2 (July 2018), `branch-2.2` has 40 patches (including security 
> patches).
>
> https://gitbox.apache.org/repos/asf?p=spark.git;a=shortlog;h=refs/heads/branch-2.2
>
> If it's okay and there is no further plan on `branch-2.2`, I want to 
> volunteer to prepare the first RC (early next week?).
>
> Please let me know your opinions about this.
>
> Bests,
> Dongjoon.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Podling Report Reminder - January 2019

2018-12-28 Thread Felix Cheung
Thanks Subbu.

Is this the first report? If so, you can add: source repo migrated to Apache, 
mailing lists and JIRA created. Also a website created.

Otherwise LGTM



From: Subbu Subramaniam 
Sent: Friday, December 28, 2018 5:12 PM
To: g.kish...@gmail.com; dev@pinot.apache.org
Subject: Re: Podling Report Reminder - January 2019

Thanks Felix.

I have updated the podling page. Please take a look and let me know if there is 
anything else needed.

-Subbu

From: Felix Cheung 
Sent: Thursday, December 27, 2018 12:12 PM
To: dev@pinot.apache.org; g.kish...@gmail.com
Subject: Re: Podling Report Reminder - January 2019

Another reminder since we are less than one week away from the deadline.


From: Felix Cheung 
Sent: Friday, December 21, 2018 8:07 PM
To: dev@pinot.apache.org; g.kish...@gmail.com
Subject: Re: Podling Report Reminder - January 2019

Hi - quick reminder please see below.

Report is due Wed January 02 -- Podling reports due by end of day

Best to kick off discussions soon. Happy holidays!



From: jmcl...@apache.org
Sent: Friday, December 21, 2018 4:47 PM
To: d...@pinot.incubator.apache.org
Subject: Podling Report Reminder - January 2019

Dear podling,

This email was sent by an automated system on behalf of the Apache
Incubator PMC. It is an initial reminder to give you plenty of time to
prepare your quarterly board report.

The board meeting is scheduled for Wed, 16 January 2019, 10:30 am PDT.
The report for your podling will form a part of the Incubator PMC
report. The Incubator PMC requires your report to be submitted 2 weeks
before the board meeting, to allow sufficient time for review and
submission (Wed, January 02).

Please submit your report with sufficient time to allow the Incubator
PMC, and subsequently board members to review and digest. Again, the
very latest you should submit your report is 2 weeks prior to the board
meeting.

Candidate names should not be made public before people are actually
elected, so please do not include the names of potential committers or
PPMC members in your report.

Thanks,

The Apache Incubator PMC

Submitting your Report

--

Your report should contain the following:

* Your project name
* A brief description of your project, which assumes no knowledge of
the project or necessarily of its field
* A list of the three most important issues to address in the move
towards graduation.
* Any issues that the Incubator PMC or ASF Board might wish/need to be
aware of
* How has the community developed since the last report
* How has the project developed since the last report.
* How does the podling rate their own maturity.

This should be appended to the Incubator Wiki page at:

https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwiki.apache.org%2Fincubator%2FJanuary2019data=02%7C01%7Cssubramaniam%40linkedin.com%7C175bb3ae2a934ee8f6fe08d66c37afb6%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C636815383767120688sdata=FKZKBqIUW5f%2FtdVaReuEdQmUOCrfDD4pCn4Ydqx7nvo%3Dreserved=0

Note: This is manually populated. You may need to wait a little before
this page is created from a template.

Mentors
---

Mentors should review reports for their project(s) and sign them off on
the Incubator wiki page. Signing off reports shows that you are
following the project - projects that are not signed may raise alarms
for the Incubator PMC.

Incubator PMC

-
To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org
For additional commands, e-mail: dev-h...@pinot.apache.org



Pinot website and DOAP

2018-12-28 Thread Felix Cheung
Hello,

I’ve opened a PR to update some trademark references on the website
https://github.com/apache/incubator-pinot-site/pull/4

Also open JIRA on link back
https://issues.apache.org/jira/browse/PINOT-2

And JIRA on DOAP
https://issues.apache.org/jira/browse/PINOT-3

Regards



Re: Podling Report Reminder - January 2019

2018-12-27 Thread Felix Cheung
Another reminder since we are less than one week away from the deadline.


From: Felix Cheung 
Sent: Friday, December 21, 2018 8:07 PM
To: dev@pinot.apache.org; g.kish...@gmail.com
Subject: Re: Podling Report Reminder - January 2019

Hi - quick reminder please see below.

Report is due Wed January 02 -- Podling reports due by end of day

Best to kick off discussions soon. Happy holidays!



From: jmcl...@apache.org
Sent: Friday, December 21, 2018 4:47 PM
To: d...@pinot.incubator.apache.org
Subject: Podling Report Reminder - January 2019

Dear podling,

This email was sent by an automated system on behalf of the Apache
Incubator PMC. It is an initial reminder to give you plenty of time to
prepare your quarterly board report.

The board meeting is scheduled for Wed, 16 January 2019, 10:30 am PDT.
The report for your podling will form a part of the Incubator PMC
report. The Incubator PMC requires your report to be submitted 2 weeks
before the board meeting, to allow sufficient time for review and
submission (Wed, January 02).

Please submit your report with sufficient time to allow the Incubator
PMC, and subsequently board members to review and digest. Again, the
very latest you should submit your report is 2 weeks prior to the board
meeting.

Candidate names should not be made public before people are actually
elected, so please do not include the names of potential committers or
PPMC members in your report.

Thanks,

The Apache Incubator PMC

Submitting your Report

--

Your report should contain the following:

* Your project name
* A brief description of your project, which assumes no knowledge of
the project or necessarily of its field
* A list of the three most important issues to address in the move
towards graduation.
* Any issues that the Incubator PMC or ASF Board might wish/need to be
aware of
* How has the community developed since the last report
* How has the project developed since the last report.
* How does the podling rate their own maturity.

This should be appended to the Incubator Wiki page at:

https://wiki.apache.org/incubator/January2019

Note: This is manually populated. You may need to wait a little before
this page is created from a template.

Mentors
---

Mentors should review reports for their project(s) and sign them off on
the Incubator wiki page. Signing off reports shows that you are
following the project - projects that are not signed may raise alarms
for the Incubator PMC.

Incubator PMC

-
To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org
For additional commands, e-mail: dev-h...@pinot.apache.org



Re: Podling Report Reminder - January 2019

2018-12-21 Thread Felix Cheung
Hi - quick reminder please see below.

Report is due Wed January 02 -- Podling reports due by end of day

Best to kick off discussions soon. Happy holidays!



From: jmcl...@apache.org
Sent: Friday, December 21, 2018 4:47 PM
To: d...@pinot.incubator.apache.org
Subject: Podling Report Reminder - January 2019

Dear podling,

This email was sent by an automated system on behalf of the Apache
Incubator PMC. It is an initial reminder to give you plenty of time to
prepare your quarterly board report.

The board meeting is scheduled for Wed, 16 January 2019, 10:30 am PDT.
The report for your podling will form a part of the Incubator PMC
report. The Incubator PMC requires your report to be submitted 2 weeks
before the board meeting, to allow sufficient time for review and
submission (Wed, January 02).

Please submit your report with sufficient time to allow the Incubator
PMC, and subsequently board members to review and digest. Again, the
very latest you should submit your report is 2 weeks prior to the board
meeting.

Candidate names should not be made public before people are actually
elected, so please do not include the names of potential committers or
PPMC members in your report.

Thanks,

The Apache Incubator PMC

Submitting your Report

--

Your report should contain the following:

* Your project name
* A brief description of your project, which assumes no knowledge of
the project or necessarily of its field
* A list of the three most important issues to address in the move
towards graduation.
* Any issues that the Incubator PMC or ASF Board might wish/need to be
aware of
* How has the community developed since the last report
* How has the project developed since the last report.
* How does the podling rate their own maturity.

This should be appended to the Incubator Wiki page at:

https://wiki.apache.org/incubator/January2019

Note: This is manually populated. You may need to wait a little before
this page is created from a template.

Mentors
---

Mentors should review reports for their project(s) and sign them off on
the Incubator wiki page. Signing off reports shows that you are
following the project - projects that are not signed may raise alarms
for the Incubator PMC.

Incubator PMC

-
To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org
For additional commands, e-mail: dev-h...@pinot.apache.org



Re: Requested gitbox migration

2018-12-19 Thread Felix Cheung
Thanks!


From: Jongyoul Lee 
Sent: Wednesday, December 19, 2018 6:09:50 PM
To: dev
Subject: Requested gitbox migration

Please check the link below:
https://issues.apache.org/jira/browse/INFRA-17477

JL

--
이종열, Jongyoul Lee, 李宗烈
http://madeng.net


Re: [DISCUSS] Make interpreters' repository

2018-12-14 Thread Felix Cheung
Sure.



From: Jeff Zhang 
Sent: Thursday, December 13, 2018 11:29 PM
To: Jongyoul Lee
Cc: dev
Subject: Re: [DISCUSS] Make interpreters' repository

Thanks @Jongyoul Lee  , let's clean the interface
first.

Jongyoul Lee  于2018年12月14日周五 下午1:51写道:

> Thanks, Jeff and Felix,
>
> I simply thought it would be better to construct a more clear view for
> devs. But, as you mentioned, it could scatter our concentrations. Let's try
> to do efforts to clean up our code and test scenarios inside one repository.
>
> Thank you guys.
>
> On Fri, Dec 14, 2018 at 12:03 PM Jeff Zhang  wrote:
>
>> Agree with Felix, this is what I also concern in my last email. @Jongyoul
>> Lee  Could you explain more about how separate repo
>> help here ? Thanks
>>
>>
>> Felix Cheung  于2018年12月14日周五 上午11:00写道:
>>
>>> In my opinion, definitely a clean interface will be very useful and
>>> having a better way to test is good.
>>>
>>> But sounds to me like these should be possible without a code repo
>>> separation?
>>>
>>> The downside of separate repo (I assume still under ASF) is the
>>> spreading the attention of committers and contributors.
>>>
>>>
>>>
>>> 
>>> From: Jongyoul Lee 
>>> Sent: Tuesday, December 11, 2018 10:33 PM
>>> To: dev
>>> Subject: Re: [DISCUSS] Make interpreters' repository
>>>
>>> And for testing Z as well, we don't have to build Spark Interpreter again
>>> for integration tests. and even, we don't have to build zeppelin server
>>> and
>>> web for integration test for spark. We just use components built. I
>>> believe
>>> it makes our CI as well faster.
>>>
>>> On Wed, Dec 12, 2018 at 3:29 PM Jongyoul Lee  wrote:
>>>
>>> > Yes, right. BTW, I think we need to make dependencies clearly between
>>> > zeppelin-server and interpreters, and even among interpreters. Some
>>> > versions properties are used in zeppelin-server and interpreters, but
>>> no
>>> > one has any clear view for them. So I thought it would be a change
>>> when we
>>> > divide repositories. The second one is about building and compiling. We
>>> > don't have to build Zeppelin fully when building some components. We
>>> also
>>> > can do it with custom build options including '-pl !...'. I don't think
>>> > it's good and there's no reason to keep this kind of inconvenience.
>>> What do
>>> > you think?
>>> >
>>> > Regards,
>>> > JL
>>> >
>>> > On Wed, Dec 12, 2018 at 3:08 PM Jeff Zhang  wrote:
>>> >
>>> >> Hi Jongyoul,
>>> >>
>>> >> Thanks for bring this up. I don't understand how different repo will
>>> help
>>> >> here, but I thought about moving interpreters out of zeppelin for a
>>> long
>>> >> time, but don't have bandwidth for it. The release cycle of zeppelin
>>> core
>>> >> component (zeppelin-zengine, zeppelin-server) should not block the
>>> release
>>> >> of interpreter component (unless they depends on some features of
>>> >> zeppelin-zengine, zeppelin-server).
>>> >>
>>> >>
>>> >> Jongyoul Lee  于2018年12月12日周三 上午10:38写道:
>>> >>
>>> >> > Hi, dev and committers,
>>> >> >
>>> >> > Currently, I'm seeing the repositories of another apache projects.
>>> They
>>> >> > have some several repositories with different purposes. I'd like to
>>> >> suggest
>>> >> > you that we divide repositories between zeppelin-server and others.
>>> >> >
>>> >> > This will help you develop zeppelin-server without interfering from
>>> >> other
>>> >> > components and its dependencies. Even, in the case of interpreters,
>>> It
>>> >> will
>>> >> > provide more independent environments for developing interpreters
>>> >> > themselves. Currently, we had a lot of dependencies and various
>>> versions
>>> >> > for each interpreter.
>>> >> >
>>> >> > WDYT?
>>> >> >
>>> >> > Regards,
>>> >> > JL
>>> >> >
>>> >> > --
>>> >> > 이종열, Jongyoul Lee, 李宗烈
>>> >> > http://madeng.net
>>> >> >
>>> >>
>>> >>
>>> >> --
>>> >> Best Regards
>>> >>
>>> >> Jeff Zhang
>>> >>
>>> >
>>> >
>>> > --
>>> > 이종열, Jongyoul Lee, 李宗烈
>>> > http://madeng.net
>>> >
>>>
>>>
>>> --
>>> 이종열, Jongyoul Lee, 李宗烈
>>> http://madeng.net
>>>
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>
>
> --
> 이종열, Jongyoul Lee, 李宗烈
> http://madeng.net
>


--
Best Regards

Jeff Zhang


Re: [DISCUSS] Fwd: [NOTICE] Mandatory relocation of Apache git repositories on git-wip-us.apache.org

2018-12-14 Thread Felix Cheung
+1

Then let’s go ahead with this. I’m going to keep this open over the weekend and 
then open an INFRA ticket unless anyone has a concern.




From: Prabhjyot Singh 
Sent: Tuesday, December 11, 2018 9:22 AM
To: dev
Subject: Re: [DISCUSS] Fwd: [NOTICE] Mandatory relocation of Apache git 
repositories on git-wip-us.apache.org

+1 for earlier.

On Tue, 11 Dec 2018 at 22:46, Felix Cheung 
wrote:

> Yes that’s all it takes to migrate.
>
> (And committers to setup gitbox link)
>
> Any more thoughts from the community?
>
>
> 
> From: Jongyoul Lee 
> Sent: Tuesday, December 11, 2018 1:38 AM
> To: dev
> Subject: Re: [DISCUSS] Fwd: [NOTICE] Mandatory relocation of Apache git
> repositories on git-wip-us.apache.org
>
> We could create a ticket for the infra only, correct?
>
> On Mon, Dec 10, 2018 at 12:45 PM Jeff Zhang  wrote:
>
> > Definitely +1 for earlier, anyone volunteer for this ?
> >
> >
> > Jongyoul Lee  于2018年12月10日周一 上午11:34写道:
> >
> > > I don't think we have any special reason not to move there.
> > >
> > > +1 for earlier
> > >
> > > On Mon, Dec 10, 2018 at 3:56 AM Felix Cheung 
> > > wrote:
> > >
> > > > Hi community,
> > > >
> > > > The move to gitbox is coming. This does not affect Contributors -
> > mostly
> > > > how PR is merged. We could choose to voluntarily move early, or wait
> > till
> > > > later.
> > > >
> > > > So to discuss, should we move early?
> > > >
> > > >
> > > > -- Forwarded message -
> > > > From: Daniel Gruno 
> > > > Date: Fri, Dec 7, 2018 at 8:54 AM
> > > > Subject: [NOTICE] Mandatory relocation of Apache git repositories on
> > > > git-wip-us.apache.org
> > > > To: us...@infra.apache.org 
> > > >
> > > >
> > > > [IF YOUR PROJECT DOES NOT HAVE GIT REPOSITORIES ON GIT-WIP-US PLEASE
> > > > DISREGARD THIS EMAIL; IT WAS MASS-MAILED TO ALL APACHE PROJECTS]
> > > >
> > > > Hello Apache projects,
> > > >
> > > > I am writing to you because you may have git repositories on the
> > > > git-wip-us server, which is slated to be decommissioned in the coming
> > > > months. All repositories will be moved to the new gitbox service
> which
> > > > includes direct write access on github as well as the standard ASF
> > > > commit access via gitbox.apache.org.
> > > >
> > > > ## Why this move? ##
> > > > The move comes as a result of retiring the git-wip service, as the
> > > > hardware it runs on is longing for retirement. In lieu of this, we
> > > > have decided to consolidate the two services (git-wip and gitbox), to
> > > > ease the management of our repository systems and future-proof the
> > > > underlying hardware. The move is fully automated, and ideally,
> nothing
> > > > will change in your workflow other than added features and access to
> > > > GitHub.
> > > >
> > > > ## Timeframe for relocation ##
> > > > Initially, we are asking that projects voluntarily request to move
> > > > their repositories to gitbox, hence this email. The voluntary
> > > > timeframe is between now and January 9th 2019, during which projects
> > > > are free to either move over to gitbox or stay put on git-wip. After
> > > > this phase, we will be requiring the remaining projects to move
> within
> > > > one month, after which we will move the remaining projects over.
> > > >
> > > > To have your project moved in this initial phase, you will need:
> > > >
> > > > - Consensus in the project (documented via the mailing list)
> > > > - File a JIRA ticket with INFRA to voluntarily move your project
> repos
> > > > over to gitbox (as stated, this is highly automated and will take
> > > > between a minute and an hour, depending on the size and number of
> > > > your repositories)
> > > >
> > > > To sum up the preliminary timeline;
> > > >
> > > > - December 9th 2018 -> January 9th 2019: Voluntary (coordinated)
> > > > relocation
> > > > - January 9th -> February 6th: Mandated (coordinated) relocation
> > > > - February 7th: All remaining repositories are mass migrated.
> > > >
> > > > This timeline m

Re: [DISCUSS] Moving to gitbox

2018-12-14 Thread Felix Cheung
I believe that’s what the earlier thread is for.



From: Jongyoul Lee 
Sent: Thursday, December 13, 2018 9:54 PM
To: dev
Subject: Re: [DISCUSS] Moving to gitbox

Yes, right.

That's because Infra suggests that we leave a mail thread to make a
consensus about it.


On Fri, Dec 14, 2018 at 12:14 PM Felix Cheung 
wrote:

> Hi Jongyoul - is this the same as the earlier thread?
>
>
> 
> From: Jongyoul Lee 
> Sent: Tuesday, December 11, 2018 6:28 PM
> To: dev
> Subject: [DISCUSS] Moving to gitbox
>
> Hi, devs,
>
> I'd like to make a consensus to move our repository from git-wip to gitbox.
>
> Please give your opinions with replies from this email.
>
> Thanks in advance,
> JL
>
> --
> 이종열, Jongyoul Lee, 李宗烈
> http://madeng.net
>


--
이종열, Jongyoul Lee, 李宗烈
http://madeng.net


Re: [DISCUSS] Moving to gitbox

2018-12-13 Thread Felix Cheung
Hi Jongyoul - is this the same as the earlier thread?



From: Jongyoul Lee 
Sent: Tuesday, December 11, 2018 6:28 PM
To: dev
Subject: [DISCUSS] Moving to gitbox

Hi, devs,

I'd like to make a consensus to move our repository from git-wip to gitbox.

Please give your opinions with replies from this email.

Thanks in advance,
JL

--
이종열, Jongyoul Lee, 李宗烈
http://madeng.net


Re: [DISCUSS] Make interpreters' repository

2018-12-13 Thread Felix Cheung
In my opinion, definitely a clean interface will be very useful and having a 
better way to test is good.

But sounds to me like these should be possible without a code repo separation?

The downside of separate repo (I assume still under ASF) is the spreading the 
attention of committers and contributors.




From: Jongyoul Lee 
Sent: Tuesday, December 11, 2018 10:33 PM
To: dev
Subject: Re: [DISCUSS] Make interpreters' repository

And for testing Z as well, we don't have to build Spark Interpreter again
for integration tests. and even, we don't have to build zeppelin server and
web for integration test for spark. We just use components built. I believe
it makes our CI as well faster.

On Wed, Dec 12, 2018 at 3:29 PM Jongyoul Lee  wrote:

> Yes, right. BTW, I think we need to make dependencies clearly between
> zeppelin-server and interpreters, and even among interpreters. Some
> versions properties are used in zeppelin-server and interpreters, but no
> one has any clear view for them. So I thought it would be a change when we
> divide repositories. The second one is about building and compiling. We
> don't have to build Zeppelin fully when building some components. We also
> can do it with custom build options including '-pl !...'. I don't think
> it's good and there's no reason to keep this kind of inconvenience. What do
> you think?
>
> Regards,
> JL
>
> On Wed, Dec 12, 2018 at 3:08 PM Jeff Zhang  wrote:
>
>> Hi Jongyoul,
>>
>> Thanks for bring this up. I don't understand how different repo will help
>> here, but I thought about moving interpreters out of zeppelin for a long
>> time, but don't have bandwidth for it. The release cycle of zeppelin core
>> component (zeppelin-zengine, zeppelin-server) should not block the release
>> of interpreter component (unless they depends on some features of
>> zeppelin-zengine, zeppelin-server).
>>
>>
>> Jongyoul Lee  于2018年12月12日周三 上午10:38写道:
>>
>> > Hi, dev and committers,
>> >
>> > Currently, I'm seeing the repositories of another apache projects. They
>> > have some several repositories with different purposes. I'd like to
>> suggest
>> > you that we divide repositories between zeppelin-server and others.
>> >
>> > This will help you develop zeppelin-server without interfering from
>> other
>> > components and its dependencies. Even, in the case of interpreters, It
>> will
>> > provide more independent environments for developing interpreters
>> > themselves. Currently, we had a lot of dependencies and various versions
>> > for each interpreter.
>> >
>> > WDYT?
>> >
>> > Regards,
>> > JL
>> >
>> > --
>> > 이종열, Jongyoul Lee, 李宗烈
>> > http://madeng.net
>> >
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>
>
> --
> 이종열, Jongyoul Lee, 李宗烈
> http://madeng.net
>


--
이종열, Jongyoul Lee, 李宗烈
http://madeng.net


Re: [DISCUSS] Fwd: [NOTICE] Mandatory relocation of Apache git repositories on git-wip-us.apache.org

2018-12-11 Thread Felix Cheung
Yes that’s all it takes to migrate.

(And committers to setup gitbox link)

Any more thoughts from the community?



From: Jongyoul Lee 
Sent: Tuesday, December 11, 2018 1:38 AM
To: dev
Subject: Re: [DISCUSS] Fwd: [NOTICE] Mandatory relocation of Apache git 
repositories on git-wip-us.apache.org

We could create a ticket for the infra only, correct?

On Mon, Dec 10, 2018 at 12:45 PM Jeff Zhang  wrote:

> Definitely +1 for earlier, anyone volunteer for this ?
>
>
> Jongyoul Lee  于2018年12月10日周一 上午11:34写道:
>
> > I don't think we have any special reason not to move there.
> >
> > +1 for earlier
> >
> > On Mon, Dec 10, 2018 at 3:56 AM Felix Cheung 
> > wrote:
> >
> > > Hi community,
> > >
> > > The move to gitbox is coming. This does not affect Contributors -
> mostly
> > > how PR is merged. We could choose to voluntarily move early, or wait
> till
> > > later.
> > >
> > > So to discuss, should we move early?
> > >
> > >
> > > -- Forwarded message -
> > > From: Daniel Gruno 
> > > Date: Fri, Dec 7, 2018 at 8:54 AM
> > > Subject: [NOTICE] Mandatory relocation of Apache git repositories on
> > > git-wip-us.apache.org
> > > To: us...@infra.apache.org 
> > >
> > >
> > > [IF YOUR PROJECT DOES NOT HAVE GIT REPOSITORIES ON GIT-WIP-US PLEASE
> > > DISREGARD THIS EMAIL; IT WAS MASS-MAILED TO ALL APACHE PROJECTS]
> > >
> > > Hello Apache projects,
> > >
> > > I am writing to you because you may have git repositories on the
> > > git-wip-us server, which is slated to be decommissioned in the coming
> > > months. All repositories will be moved to the new gitbox service which
> > > includes direct write access on github as well as the standard ASF
> > > commit access via gitbox.apache.org.
> > >
> > > ## Why this move? ##
> > > The move comes as a result of retiring the git-wip service, as the
> > > hardware it runs on is longing for retirement. In lieu of this, we
> > > have decided to consolidate the two services (git-wip and gitbox), to
> > > ease the management of our repository systems and future-proof the
> > > underlying hardware. The move is fully automated, and ideally, nothing
> > > will change in your workflow other than added features and access to
> > > GitHub.
> > >
> > > ## Timeframe for relocation ##
> > > Initially, we are asking that projects voluntarily request to move
> > > their repositories to gitbox, hence this email. The voluntary
> > > timeframe is between now and January 9th 2019, during which projects
> > > are free to either move over to gitbox or stay put on git-wip. After
> > > this phase, we will be requiring the remaining projects to move within
> > > one month, after which we will move the remaining projects over.
> > >
> > > To have your project moved in this initial phase, you will need:
> > >
> > > - Consensus in the project (documented via the mailing list)
> > > - File a JIRA ticket with INFRA to voluntarily move your project repos
> > > over to gitbox (as stated, this is highly automated and will take
> > > between a minute and an hour, depending on the size and number of
> > > your repositories)
> > >
> > > To sum up the preliminary timeline;
> > >
> > > - December 9th 2018 -> January 9th 2019: Voluntary (coordinated)
> > > relocation
> > > - January 9th -> February 6th: Mandated (coordinated) relocation
> > > - February 7th: All remaining repositories are mass migrated.
> > >
> > > This timeline may change to accommodate various scenarios.
> > >
> > > ## Using GitHub with ASF repositories ##
> > > When your project has moved, you are free to use either the ASF
> > > repository system (gitbox.apache.org) OR GitHub for your development
> > > and code pushes. To be able to use GitHub, please follow the primer
> > > at: https://reference.apache.org/committer/github
> > >
> > >
> > > We appreciate your understanding of this issue, and hope that your
> > > project can coordinate voluntarily moving your repositories in a
> > > timely manner.
> > >
> > > All settings, such as commit mail targets, issue linking, PR
> > > notification schemes etc will automatically be migrated to gitbox as
> > > well.
> > >
> > > With regards, Daniel on behalf of ASF Infra.
> > >
> > > PS:For inquiries, please reply to us...@infra.apache.org, not your
> > > project's dev list :-).
> > >
> > >
> > >
> > > -
> > > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > > For additional commands, e-mail: general-h...@incubator.apache.org
> > >
> >
> >
> > --
> > 이종열, Jongyoul Lee, 李宗烈
> > http://madeng.net
> >
>
>
> --
> Best Regards
>
> Jeff Zhang
>


--
이종열, Jongyoul Lee, 李宗烈
http://madeng.net


Re: [DISCUSS] Mandatory relocation of Apache git repositories on git-wip-us.apache.org

2018-12-10 Thread Felix Cheung
+1

On Mon, Dec 10, 2018 at 12:56 PM Roy Lenferink 
wrote:

> Hi all,
>
> The Apache Incubator is still having a repository on git-wip-us as well
> [1].
>
> Does anyone have a problem with moving over the incubator repository to
> gitbox voluntarily?
> This means integrated access and easy PRs (write access to the GitHub
> repo).
>
> We need to document support for the decision from a mailing list post, so
> here it is.
>
> - Roy
>
> [1] https://git-wip-us.apache.org/repos/asf/incubator.git
>
> -- Forwarded message -
> From: Daniel Gruno 
> Date: vr 7 dec. 2018 om 17:53
> Subject: [NOTICE] Mandatory relocation of Apache git repositories on
> git-wip-us.apache.org
> To: us...@infra.apache.org 
>
> [IF YOUR PROJECT DOES NOT HAVE GIT REPOSITORIES ON GIT-WIP-US PLEASE
>   DISREGARD THIS EMAIL; IT WAS MASS-MAILED TO ALL APACHE PROJECTS]
>
> Hello Apache projects,
>
> I am writing to you because you may have git repositories on the
> git-wip-us server, which is slated to be decommissioned in the coming
> months. All repositories will be moved to the new gitbox service which
> includes direct write access on github as well as the standard ASF
> commit access via gitbox.apache.org.
>
> ## Why this move? ##
> The move comes as a result of retiring the git-wip service, as the
> hardware it runs on is longing for retirement. In lieu of this, we
> have decided to consolidate the two services (git-wip and gitbox), to
> ease the management of our repository systems and future-proof the
> underlying hardware. The move is fully automated, and ideally, nothing
> will change in your workflow other than added features and access to
> GitHub.
>
> ## Timeframe for relocation ##
> Initially, we are asking that projects voluntarily request to move
> their repositories to gitbox, hence this email. The voluntary
> timeframe is between now and January 9th 2019, during which projects
> are free to either move over to gitbox or stay put on git-wip. After
> this phase, we will be requiring the remaining projects to move within
> one month, after which we will move the remaining projects over.
>
> To have your project moved in this initial phase, you will need:
>
> - Consensus in the project (documented via the mailing list)
> - File a JIRA ticket with INFRA to voluntarily move your project repos
>over to gitbox (as stated, this is highly automated and will take
>between a minute and an hour, depending on the size and number of
>your repositories)
>
> To sum up the preliminary timeline;
>
> - December 9th 2018 -> January 9th 2019: Voluntary (coordinated)
>relocation
> - January 9th -> February 6th: Mandated (coordinated) relocation
> - February 7th: All remaining repositories are mass migrated.
>
> This timeline may change to accommodate various scenarios.
>
> ## Using GitHub with ASF repositories ##
> When your project has moved, you are free to use either the ASF
> repository system (gitbox.apache.org) OR GitHub for your development
> and code pushes. To be able to use GitHub, please follow the primer
> at: https://reference.apache.org/committer/github
>
>
> We appreciate your understanding of this issue, and hope that your
> project can coordinate voluntarily moving your repositories in a
> timely manner.
>
> All settings, such as commit mail targets, issue linking, PR
> notification schemes etc will automatically be migrated to gitbox as
> well.
>
> With regards, Daniel on behalf of ASF Infra.
>
> PS:For inquiries, please reply to us...@infra.apache.org, not your
> project's dev list :-).
>


Re: Questions about building the website of IoTDB Project

2018-12-09 Thread Felix Cheung
Looks good! Thanks for the mock ups.

On your homepage, references to Hadoop and Spark should be Apache Hadoop and 
Apache Spark, because of trademarks - and it might be good to link to their 
project pages.




From: Stefanie Zhao 
Sent: Friday, December 7, 2018 2:03 AM
To: 黄向东
Cc: dev@iotdb.apache.org; 徐毅
Subject: Re:Re: Questions about building the website of IoTDB Project

Dear all,


Prototype pages link are as follows, questions and suggestions are welcome.
Here are the links:
【homepage】https://s1.ax1x.com/2018/12/07/F3E1T1.png
【download page】https://s1.ax1x.com/2018/12/07/F3ElwR.png
【documentation page】https://s1.ax1x.com/2018/12/07/F3E8Fx.png
【tools page】https://s1.ax1x.com/2018/12/07/F3EJfK.png
【other page】https://s1.ax1x.com/2018/12/07/F3EQm9.png


Best, Xinyi
--

Xinyi Zhao(Stefanie)
School of Software, Tsinghua University
E-mail:stefanie_...@163.com


At 2018-12-07 11:01:56, "Xiangdong Huang"  wrote:

Hi,


Xinyi has finished the prototype pages.


Yi Xu, can you build the website with several classmates?


You can follow some other apache project websites.


Notice that the runtime is Apache HTTP server. Besides, I noticed that many 
existing projects use Jekyll. You can have a try.


Xinyi, can you attach the prototype pages here as a JPEG? (I am not sure that 
whether mail list supports figures..)


Best,


---
Xiangdong Huang
School of Software, Tsinghua University


黄向东
清华大学 软件学院




Christofer Dutz  于2018年11月30日周五 下午8:07写道:

Hi all,

yeah ... bin there ... done that ... guilty as charged.

Well in general Apache websites are simple static HTML/JS/CSS content served by 
a really fat Apache HTTPD server.

However you don't copy stuff to the webserver directly. As mentioned before you 
usually setup a detached branch in a code repo named "asf-site" and then ask 
infra to sync that with your projects website.

In the Apache PLC4X (incubating) project we generate the website as part of the 
maven build. Everything is automatically pushed to a dedicated website git 
repos asf-site branch.

However this automatic push has to be executed on Jenkins nodes tagged with 
"git-websites" as only these have the credentials to automatically push to ASF 
code repos.

I will gladly help you get setup as I know this can be quite a pain if you 
don't know all the little details.


Chris



Am 30.11.18, 04:34 schrieb "Justin Mclean" :

HI,

And created for you [1]. Chris (one of the other mentors) has experience 
creating apache websites from scratch and may have some advice to offer.

Thanks,
Justin

1. https://github.com/apache/incubator-iotdb-website



Re: Incubator Podling Report (Due 5th December)

2018-12-05 Thread Felix Cheung
Hi - just the quick note that the report is due today.

Justin has sent a separate reminder. It’s important to put some quick status on 
this.

For example, from what I know:

- source repo migrated
- email list setup - dev@ traffic is a bit light
- website in progress
- no change to committer & PPMC
- ? contributor growth

If dev@ is ok with this summary I can also write this into the wiki page as the 
project report.



From: Felix Cheung 
Sent: Monday, December 3, 2018 9:58 AM
To: dev@pinot.apache.org; kishore g; dev@pinot.apache.org; jmcl...@apache.org
Subject: Re: Incubator Podling Report (Due 5th December)

Process info here
https://incubator.apache.org/guides/ppmc.html#podling_status_reports

It’s a wiki page
https://wiki.apache.org/incubator/December2018



From: Subbu Subramaniam 
Sent: Monday, December 3, 2018 9:39 AM
To: kishore g; dev@pinot.apache.org; jmcl...@apache.org
Subject: Re: Incubator Podling Report (Due 5th December)

What is involved here?

Having never done one before, I have no idea. Is there a template for the 
report?

thanks

-Subbu

From: kishore g 
Sent: Sunday, December 2, 2018 2:57 PM
To: dev@pinot.apache.org
Subject: Re: Incubator Podling Report (Due 5th December)

Subbu, do you want to take a stab at this?

On Sun, Dec 2, 2018 at 12:30 PM Justin Mclean  wrote:

> Hi,
>
> The incubator PMC would appreciated if you could complete the podling
> report on time it's due on 5th December in a few days. It takes time to
> prepare the incubator report, have your mentors sign off the report and for
> the board to review it, so it's best if you can get it in early.
>
> Thanks,
> Justin
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org
> For additional commands, e-mail: dev-h...@pinot.apache.org
>
>


Re: draft podling report

2018-12-05 Thread Felix Cheung
Great! I’ve added myself preemptively to the wiki and signed off the Crail 
report.

Could one you please add me on whimsy, if it makes sense, as mentor. Thanks



From: Luciano Resende 
Sent: Tuesday, December 4, 2018 2:01 PM
To: dev@crail.apache.org
Subject: Re: draft podling report

+1, have already signed off.

On Tue, Dec 4, 2018 at 9:30 AM Julian Hyde  wrote:
>
> Looks good. Thanks for doing this promptly. I'll sign off tomorrow
> after it is final.
>
> Felix, As a new mentor, feel free to add your name below the report,
> sign off, and add comments.
>
> Julian
>
> On Tue, Dec 4, 2018 at 9:18 AM bernard metzler  wrote:
> >
> > Dear all, I added a draft podling report for Crail to
> > https://wiki.apache.org/incubator/December2018. Please
> > check if it makes all sense. Dear mentors, as usual,
> > be frank if something looks suspicious or is inappropriate,
> > or missing. I think we can still edit until tomorrow
> > evening.
> >
> > Thanks a lot,
> > Bernard.



--
Luciano Resende
http://twitter.com/lresende1975
http://lresende.blogspot.com/


Incubator Wiki write access

2018-12-04 Thread Felix Cheung
It might have lost my earlier account.

My username is FelixCheung

Thanks


Re: [VOTE] Apache Crail 1.1-incubating (rc8)

2018-12-03 Thread Felix Cheung
Thanks for getting back to me Jonas.

Maybe I didn’t get the change in CRAIL-74. btw some of the JIRAs do not
have links to github PR?

Rat- yes thanks I think I was looking at a older change. I double check and
see the Pom file rat exclusion does not include docker or doc


On Mon, Dec 3, 2018 at 12:29 AM Jonas Pfefferle  wrote:

> Hi Felix
>
>
>   On Fri, 30 Nov 2018 15:43:45 -0800
>   Felix Cheung  wrote:
> > +1 (binding)
> >
> > a few comments below, checked:
> > filename
> > signature & hash
> > DISCLAIMER, LICENSE, NOTICE
> > build from src
> > no binary
> > src files have headers (see below)
> >
> > comments, not blocker for release IMO:
> > 1.
> > CREDITS file is a bit non-standard in an ASF release - this is
> >generally
> > not included as it is already captured in git history and SGA
>
> The CREDITS was introduced for the past IBM copyright notice:
> https://jira.apache.org/jira/projects/CRAIL/issues/CRAIL-33
>
> >
> > 2.
> > https://jira.apache.org/jira/projects/CRAIL/issues/CRAIL-74 is
> >marked as
> >Fixed but I don't see a change in the -bin tarball?
>
> At least on my machine the binary tarball now has a toplevel directory.
> Can
> someone else confirm?
>
> >
> > 3.
> > licenses/ directory do not need to include those from ASF and on
> >Apache v2
> > license, eg.
> > apache-crail-1.1-incubating/licenses $ grep -e "Apache" *
> > LICENSE.commons-logging.txt: Apache License
> > LICENSE.commons-math3-3.1.1: Apache License
>
> Makes sense, we will remove them on the next release.
>
> >
> > 4.
> > Doc mentions Libdisni is a requirement - it might help to list the
> > supported/tested releases of Libdisni
>
> I agree, the requirements for building/running Crail need to be fixed.
> What you need very much depends on which datatiers you want to run:
> https://jira.apache.org/jira/projects/CRAIL/issues/CRAIL-68
>
> >
> > 5.
> > ASF header - docker/* and doc/* and conf/* can also have ASF header
> >as
> > comment block - consider adding that
>
> docker/* and doc/* do have have ASF headers, the only thing excluded are
> conf/*, credits and licenses.
> Not sure what the point is of putting ASF headers in configuration file
> templates. I have checked multiple other projects and none had any.
>
>
> Thanks,
> Jonas
>
> >
> >
> > On Thu, Nov 29, 2018 at 6:50 AM Adrian Schuepbach
> >
> > wrote:
> >
> >> Hi all
> >>
> >> Please vote to approve the release of Apache Crail 1.1-incubating
> >>(rc8).
> >>
> >> The podling dev vote thread:
> >>
> >> https://www.mail-archive.com/dev@crail.apache.org/msg00519.html
> >>
> >> The result:
> >>
> >> https://www.mail-archive.com/dev@crail.apache.org/msg00526.html
> >>
> >> Commit hash: 08c75b55f7f97be869049cf80a0da5347e550a3d
> >>
> >>
> >>
> https://git-wip-us.apache.org/repos/asf?p=incubator-crail.git;a=commit;h=08c75b55f7f97be869049cf80a0da5347e550a3d
> >>
> >>
> >> Release files can be found at:
> >> https://dist.apache.org/repos/dist/dev/incubator/crail/1.1-rc8/
> >>
> >> The Nexus Staging URL:
> >> https://repository.apache.org/content/repositories/orgapachecrail-1007/
> >>
> >> Release artifacts are signed with the following key:
> >> https://www.apache.org/dist/incubator/crail/KEYS
> >>
> >> For information about the contents of this release, see:
> >>
> >>
> https://git-wip-us.apache.org/repos/asf?p=incubator-crail.git;a=blob_plain;f=HISTORY.md;hb=08c75b55f7f97be869049cf80a0da5347e550a3d
> >> or
> >>
> >>
> https://github.com/apache/incubator-crail/blob/08c75b55f7f97be869049cf80a0da5347e550a3d/HISTORY.md
> >>
> >> The vote is open for at least 72 hours and passes if a majority of
> >>at
> >> least 3 +1 PMC votes are cast.
> >>
> >> [ ] +1 Release this package as Apache Crail 1.1-incubating
> >> [ ] -1 Do not release this package because ...
> >>
> >> Thanks,
> >> Adrian
> >>
> >>
>
>


Re: Incubator Podling Report (Due 5th December)

2018-12-03 Thread Felix Cheung
Process info here
https://incubator.apache.org/guides/ppmc.html#podling_status_reports

It’s a wiki page
https://wiki.apache.org/incubator/December2018



From: Subbu Subramaniam 
Sent: Monday, December 3, 2018 9:39 AM
To: kishore g; dev@pinot.apache.org; jmcl...@apache.org
Subject: Re: Incubator Podling Report (Due 5th December)

What is involved here?

Having never done one before, I have no idea. Is there a template for the 
report?

thanks

-Subbu

From: kishore g 
Sent: Sunday, December 2, 2018 2:57 PM
To: dev@pinot.apache.org
Subject: Re: Incubator Podling Report (Due 5th December)

Subbu, do you want to take a stab at this?

On Sun, Dec 2, 2018 at 12:30 PM Justin Mclean  wrote:

> Hi,
>
> The incubator PMC would appreciated if you could complete the podling
> report on time it's due on 5th December in a few days. It takes time to
> prepare the incubator report, have your mentors sign off the report and for
> the board to review it, so it's best if you can get it in early.
>
> Thanks,
> Justin
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@pinot.apache.org
> For additional commands, e-mail: dev-h...@pinot.apache.org
>
>


[jira] [Updated] (SPARK-26247) SPIP - ML Model Extension for no-Spark MLLib Online Serving

2018-12-02 Thread Felix Cheung (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-26247:
-
Description: 
This ticket tracks an SPIP to improve model load time and model serving 
interfaces for online serving of Spark MLlib models.  The SPIP is here

[https://docs.google.com/a/uber.com/document/d/e/2PACX-1vRttVNNMBt4pBU2oBWKoiK3-7PW6RDwvHNgSMqO67ilxTX_WUStJ2ysUdAk5Im08eyHvlpcfq1g-DLF/pub]

 

The improvement opportunity exists in all versions of spark.  We developed our 
set of changes wrt version 2.1.0 and can port them forward to other versions 
(e.g., we have ported them forward to 2.3.2).

  was:
This ticket tracks an SPIP to improve model load time and model serving 
interfaces for online serving of Spark MLlib models.  The SPIP is here

[https://docs.google.com/a/uber.com/document/d/e/2PACX-1vRttVNNMBt4pBU2oBWKoiK3-7PW6RDwvHNgSMqO67ilxTX_WUStJ2ysUdAk5Im08eyHvlpcfq1g-DLF/pub]

 

The improvement opportunity exists in all versions of spark.  We developed our 
set of changes wrt version 2.1.0 and can port them forward to other versions 
(e.g., wehave ported them forward to 2.3.2).


> SPIP - ML Model Extension for no-Spark MLLib Online Serving
> ---
>
> Key: SPARK-26247
> URL: https://issues.apache.org/jira/browse/SPARK-26247
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 2.1.0
>Reporter: Anne Holler
>Priority: Major
>  Labels: SPIP
>
> This ticket tracks an SPIP to improve model load time and model serving 
> interfaces for online serving of Spark MLlib models.  The SPIP is here
> [https://docs.google.com/a/uber.com/document/d/e/2PACX-1vRttVNNMBt4pBU2oBWKoiK3-7PW6RDwvHNgSMqO67ilxTX_WUStJ2ysUdAk5Im08eyHvlpcfq1g-DLF/pub]
>  
> The improvement opportunity exists in all versions of spark.  We developed 
> our set of changes wrt version 2.1.0 and can port them forward to other 
> versions (e.g., we have ported them forward to 2.3.2).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26247) SPIP - ML Model Extension for no-Spark MLLib Online Serving

2018-12-02 Thread Felix Cheung (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-26247:
-
Target Version/s: 3.0.0  (was: 2.1.0)

> SPIP - ML Model Extension for no-Spark MLLib Online Serving
> ---
>
> Key: SPARK-26247
> URL: https://issues.apache.org/jira/browse/SPARK-26247
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 2.1.0
>Reporter: Anne Holler
>Priority: Major
>  Labels: SPIP
>
> This ticket tracks an SPIP to improve model load time and model serving 
> interfaces for online serving of Spark MLlib models.  The SPIP is here
> [https://docs.google.com/a/uber.com/document/d/e/2PACX-1vRttVNNMBt4pBU2oBWKoiK3-7PW6RDwvHNgSMqO67ilxTX_WUStJ2ysUdAk5Im08eyHvlpcfq1g-DLF/pub]
>  
> The improvement opportunity exists in all versions of spark.  We developed 
> our set of changes wrt version 2.1.0 and can port them forward to other 
> versions (e.g., wehave ported them forward to 2.3.2).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26247) SPIP - ML Model Extension for no-Spark MLLib Online Serving

2018-12-02 Thread Felix Cheung (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-26247:
-
Fix Version/s: (was: 2.1.0)

> SPIP - ML Model Extension for no-Spark MLLib Online Serving
> ---
>
> Key: SPARK-26247
> URL: https://issues.apache.org/jira/browse/SPARK-26247
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 2.1.0
>Reporter: Anne Holler
>Priority: Major
>  Labels: SPIP
>
> This ticket tracks an SPIP to improve model load time and model serving 
> interfaces for online serving of Spark MLlib models.  The SPIP is here
> [https://docs.google.com/a/uber.com/document/d/e/2PACX-1vRttVNNMBt4pBU2oBWKoiK3-7PW6RDwvHNgSMqO67ilxTX_WUStJ2ysUdAk5Im08eyHvlpcfq1g-DLF/pub]
>  
> The improvement opportunity exists in all versions of spark.  We developed 
> our set of changes wrt version 2.1.0 and can port them forward to other 
> versions (e.g., wehave ported them forward to 2.3.2).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



<    1   2   3   4   5   6   7   8   9   10   >