Re: Close PR right on Github page

2019-06-12 Thread Felix Cheung
Adrian, did you link your account? You should be able to close or merge PR once 
you have setup.

https://gitbox.apache.org/setup/


From: Adrian Schüpbach 
Sent: Tuesday, June 11, 2019 2:41:54 PM
To: dev@crail.apache.org
Subject: Re: Close PR right on Github page

Hi Julian

Thanks for your answer. Right, "Close #nnn" has worked for me,
in some cases I'd prefer that a PR, which we really can not accept,
does not even create a history, so a commit with "Close #nnn"
would not be the right approach in those cases.

I will ask INFRA how to set permissions. Since some of us
have the "Close" button on the GitHUB UI, it must be
possible somehow to configure this.


Thanks
Adrian


On 11.06.2019 19:48, Julian Hyde wrote:
> I suspect that closing a PR from the Github UI may be possible now we have 
> gitbox, if permissions are set correctly. But I don’t know how to set 
> permissions. You could try asking INFRA in hipchat.
>
> The method I use isn’t pretty but it has worked for a long time. Add a 
> comment ‘Close #nnn’ or ‘Closes #nnn’ or ‘Close apache/incubator-crail#nnn’ 
> on its own line in a commit message. When the commit is merged into master 
> the bot will close the PR.
>
> Julian
>
>
>> On Jun 11, 2019, at 1:05 AM, Adrian Schuepbach  wrote:
>>
>> Hi Julian
>>
>> It seems that I do not have the right to close (or merge) a PR. The
>> button just does not appear for me. I thought that when I linked my GitHub
>> account with my Apache ID I got the rights, but now I don't have them.
>>
>> Might those rights have been lost after migrating to gitbox?
>>
>> I am still able to merge changes via the command line. But it would
>> be great, if I could also close (or merge) PRs on the website.
>>
>> What should I do to get these rights again?
>>
>> Thanks
>> Adrian
>>
>>
>



Re: Podling reports due Wednesday

2019-06-10 Thread Felix Cheung
I don’t mind taking on as a mentor for Spot but it seems the volume of
interaction (commits, dev@) has gone down significantly in the last 6-9
months?


On Mon, Jun 10, 2019 at 3:21 PM Justin Mclean 
wrote:

> Hi,
>
> Currently we’re missing only one sign off and that is for:
> Spot
>
> Which, I forgot to mention in my last list.
>
> This is one podling that needs more mentors, is anyone available to help?
>
> Thanks,
> Justin
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


Re: Spark SQL in R?

2019-06-08 Thread Felix Cheung
I don’t think you should get a hive-xml from the internet.

It should have connection information about a running hive metastore - if you 
don’t have a hive metastore service as you are running locally (from a laptop?) 
then you don’t really need it. You can get spark to work with it’s own.




From: ya 
Sent: Friday, June 7, 2019 8:26:27 PM
To: Rishikesh Gawade; felixcheun...@hotmail.com; user@spark.apache.org
Subject: Spark SQL in R?

Dear Felix and Richikesh and list,

Thank you very much for your previous help. So far I have tried two ways to 
trigger Spark SQL: one is to use R with sparklyr library and SparkR library; 
the other way is to use SparkR shell from Spark. I am not connecting a remote 
spark cluster, but a local one. Both failed with or without hive-site.xml. I 
suspect the content of hive-site.xml I found online was not appropriate for 
this case, as the spark session can not be initialized after adding this 
hive-site.xml. My questions are:

1. Is there any example for the content of hive-site.xml for this case?

2. I used sql() function to call the Spark SQL, is this the right way to do it?

###
##Here is the content in the hive-site.xml:##
###



javax.jdo.option.ConnectionURL
jdbc:mysql://192.168.76.100:3306/hive?createDatabaseIfNotExist=true
JDBC connect string for a JDBC metastore



javax.jdo.option.ConnectionDriverName
com.mysql.jdbc.Driver
Driver class name for a JDBC metastore



javax.jdo.option.ConnectionUserName
root
username to use against metastore database



javax.jdo.option.ConnectionPassword
123
password to use against metastore database






##Here is the situation happened in R:##


> library(sparklyr) # load sparklyr package
> sc=spark_connect(master="local",spark_home="/Users/ya/Downloads/soft/spark-2.4.3-bin-hadoop2.7")
>  # connect sparklyr with spark
> sql('create database learnsql')
Error in sql("create database learnsql") : could not find function "sql"
> library(SparkR)

Attaching package: ‘SparkR’

The following object is masked from ‘package:sparklyr’:

collect

The following objects are masked from ‘package:stats’:

cov, filter, lag, na.omit, predict, sd, var, window

The following objects are masked from ‘package:base’:

as.data.frame, colnames, colnames<-, drop, endsWith, intersect, rank, rbind,
sample, startsWith, subset, summary, transform, union

> sql('create database learnsql')
Error in getSparkSession() : SparkSession not initialized
> Sys.setenv(SPARK_HOME='/Users/ya/Downloads/soft/spark-2.4.3-bin-hadoop2.7')
> sparkR.session(sparkHome=Sys.getenv('/Users/ya/Downloads/soft/spark-2.4.3-bin-hadoop2.7'))
Spark not found in SPARK_HOME:
Spark package found in SPARK_HOME: 
/Users/ya/Downloads/soft/spark-2.4.3-bin-hadoop2.7
Launching java with spark-submit command 
/Users/ya/Downloads/soft/spark-2.4.3-bin-hadoop2.7/bin/spark-submit   
sparkr-shell 
/var/folders/d8/7j6xswf92c3gmhwy_lrk63pmgn/T//Rtmpz22kK9/backend_port103d4cfcfd2c
19/06/08 11:14:57 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
Error in handleErrors(returnStatus, conn) :

…... hundreds of lines of information and mistakes here ……

> sql('create database learnsql')
Error in getSparkSession() : SparkSession not initialized



###
##Here is what happened in SparkR shell:##


Error in handleErrors(returnStatus, conn) :
  java.lang.IllegalArgumentException: Error while instantiating 
'org.apache.spark.sql.hive.HiveSessionStateBuilder':
at 
org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1107)
at 
org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:145)
at 
org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:144)
at scala.Option.getOrElse(Option.scala:121)
at 
org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:144)
at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:141)
at 
org.apache.spark.sql.api.r.SQLUtils$$anonfun$setSparkContextSessionConf$2.apply(SQLUtils.scala:80)
at 
org.apache.spark.sql.api.r.SQLUtils$$anonfun$setSparkContextSessionConf$2.apply(SQLUtils.scala:79)
at 
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:733)
at scala.collection.Iterator$class.foreach(Iterator.sca
> sql('create database learnsql')
Error in getSparkSession() : SparkSession not initialized



Thank you very much.

YA







在 2019年6月8日,上午1:44,Rishikesh Gawade 

Re: sparksql in sparkR?

2019-06-07 Thread Felix Cheung
This seem to be more a question of spark-sql shell? I may suggest you change 
the email title to get more attention.


From: ya 
Sent: Wednesday, June 5, 2019 11:48:17 PM
To: user@spark.apache.org
Subject: sparksql in sparkR?

Dear list,

I am trying to use sparksql within my R, I am having the following questions, 
could you give me some advice please? Thank you very much.

1. I connect my R and spark using the library sparkR, probably some of the 
members here also are R users? Do I understand correctly that SparkSQL can be 
connected and triggered via SparkR and used in R (not in sparkR shell of spark)?

2. I ran sparkR library in R, trying to create a new sql database and a table, 
I could not get the database and the table I want. The code looks like below:

library(SparkR)
Sys.setenv(SPARK_HOME='/Users/ya/Downloads/soft/spark-2.4.3-bin-hadoop2.7')
sparkR.session(sparkHome=Sys.getenv('/Users/ya/Downloads/soft/spark-2.4.3-bin-hadoop2.7'))
sql("create database learnsql; use learnsql")
sql("
create table employee_tbl
(emp_id varchar(10) not null,
emp_name char(10) not null,
emp_st_addr char(10) not null,
emp_city char(10) not null,
emp_st char(10) not null,
emp_zip integer(5) not null,
emp_phone integer(10) null,
emp_pager integer(10) null);
insert into employee_tbl values ('0001','john','yanlanjie 
1','gz','jiaoqiaojun','510006','1353');
select*from employee_tbl;
“)

I ran the following code in spark-sql shell, I get the database learnsql, 
however, I still can’t get the table.

spark-sql> create database learnsql;show databases;
19/06/06 14:42:36 INFO HiveMetaStore: 0: create_database: 
Database(name:learnsql, description:, 
locationUri:file:/Users/ya/spark-warehouse/learnsql.db, parameters:{})
19/06/06 14:42:36 INFO audit: ugi=yaip=unknown-ip-addr  
cmd=create_database: Database(name:learnsql, description:, 
locationUri:file:/Users/ya/spark-warehouse/learnsql.db, parameters:{})
Error in query: org.apache.hadoop.hive.metastore.api.AlreadyExistsException: 
Database learnsql already exists;

spark-sql> create table employee_tbl
 > (emp_id varchar(10) not null,
 > emp_name char(10) not null,
 > emp_st_addr char(10) not null,
 > emp_city char(10) not null,
 > emp_st char(10) not null,
 > emp_zip integer(5) not null,
 > emp_phone integer(10) null,
 > emp_pager integer(10) null);
Error in query:
no viable alternative at input 'create table employee_tbl\n(emp_id varchar(10) 
not'(line 2, pos 20)

== SQL ==
create table employee_tbl
(emp_id varchar(10) not null,
^^^
emp_name char(10) not null,
emp_st_addr char(10) not null,
emp_city char(10) not null,
emp_st char(10) not null,
emp_zip integer(5) not null,
emp_phone integer(10) null,
emp_pager integer(10) null)

spark-sql> insert into employee_tbl values ('0001','john','yanlanjie 
1','gz','jiaoqiaojun','510006','1353');
19/06/06 14:43:43 INFO HiveMetaStore: 0: get_table : db=default tbl=employee_tbl
19/06/06 14:43:43 INFO audit: ugi=yaip=unknown-ip-addr  cmd=get_table : 
db=default tbl=employee_tbl
Error in query: Table or view not found: employee_tbl; line 1 pos 0


Does sparkSQL has different coding grammar? What did I miss?

Thank you very much.

Best regards,

YA




-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Missing podling logos

2019-06-05 Thread Felix Cheung
Awesome


From: Seunghyun Lee 
Sent: Wednesday, June 5, 2019 2:01:00 PM
To: dev@pinot.apache.org
Subject: Re: Missing podling logos

It's updated automatically. Our logo is now available
http://www.apache.org/logos/?#pinot



On Wed, Jun 5, 2019 at 11:21 AM Seunghyun Lee  wrote:

> I have uploaded our logo to
> https://svn.apache.org/repos/asf/comdev/project-logos/originals/
>
> I followed the instruction from http://www.apache.org/logos/about.html
>
> Does this need some time to get propagated to the logo page?
>
> Seunghyun
>
> On Mon, Jun 3, 2019 at 11:04 PM Felix Cheung 
> wrote:
>
>> http://www.apache.org/logos/
>>
>> 
>> From: Subbu Subramaniam 
>> Sent: Monday, June 3, 2019 9:00 AM
>> To: dev@pinot.apache.org
>> Subject: Re: Missing podling logos
>>
>> Thanks for the reminder Felix.
>>
>> Where exactly do we add the logo? We do have one in this page:
>>
>> https://pinot.incubator.apache.org/
>>
>> thanks
>>
>> -Subbu
>> 
>> From: Felix Cheung 
>> Sent: Sunday, June 2, 2019 11:51 AM
>> To: dev@pinot.apache.org
>> Subject: Fwd: Missing podling logos
>>
>> Quick reminder...
>>
>> -- Forwarded message -
>> From: Justin Mclean 
>> Date: Sat, Jun 1, 2019 at 7:37 PM
>> Subject: Re: Missing podling logos
>> To: 
>>
>>
>> Hi,
>>
>> It’s good to see some podling have added their logos, but we’re still
>> missing a few.
>>
>> Missing logo are:
>> amaterasu
>> annotator
>> batchee
>> brpc
>> datasketches
>> dlab
>> flagon
>> gobblin
>> hudi
>> omid
>> pinot
>> s2graph
>> samoa
>> sdap
>> tuweni
>> tvm
>> weex
>> zipkin
>>
>> With some conferences coming up there's going to be stickers made and
>> we'll
>> need your logo there for that to happen.
>>
>> Thanks,
>> Justin
>> -
>> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
>> For additional commands, e-mail: general-h...@incubator.apache.org
>>
>


Re: Missing podling logos

2019-06-04 Thread Felix Cheung
http://www.apache.org/logos/


From: Subbu Subramaniam 
Sent: Monday, June 3, 2019 9:00 AM
To: dev@pinot.apache.org
Subject: Re: Missing podling logos

Thanks for the reminder Felix.

Where exactly do we add the logo? We do have one in this page:

https://pinot.incubator.apache.org/

thanks

-Subbu

From: Felix Cheung 
Sent: Sunday, June 2, 2019 11:51 AM
To: dev@pinot.apache.org
Subject: Fwd: Missing podling logos

Quick reminder...

-- Forwarded message -
From: Justin Mclean 
Date: Sat, Jun 1, 2019 at 7:37 PM
Subject: Re: Missing podling logos
To: 


Hi,

It’s good to see some podling have added their logos, but we’re still
missing a few.

Missing logo are:
amaterasu
annotator
batchee
brpc
datasketches
dlab
flagon
gobblin
hudi
omid
pinot
s2graph
samoa
sdap
tuweni
tvm
weex
zipkin

With some conferences coming up there's going to be stickers made and we'll
need your logo there for that to happen.

Thanks,
Justin
-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org


Fwd: Missing podling logos

2019-06-02 Thread Felix Cheung
Quick reminder...

-- Forwarded message -
From: Justin Mclean 
Date: Sat, Jun 1, 2019 at 7:37 PM
Subject: Re: Missing podling logos
To: 


Hi,

It’s good to see some podling have added their logos, but we’re still
missing a few.

Missing logo are:
amaterasu
annotator
batchee
brpc
datasketches
dlab
flagon
gobblin
hudi
omid
pinot
s2graph
samoa
sdap
tuweni
tvm
weex
zipkin

With some conferences coming up there's going to be stickers made and we'll
need your logo there for that to happen.

Thanks,
Justin
-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org


Re: podling report June

2019-06-01 Thread Felix Cheung
About me - yes, it was discussed on list and I wasn’t able to change the roster 
myself at the time.




From: Justin Mclean 
Sent: Thursday, May 30, 2019 8:00 PM
To: dev@crail.apache.org
Subject: Re: podling report June

Hi,

Thanks for submitting the report early much appreciated and it makes my job 
easier.

 A few minor things that if you could fix it would be appreciated.
- You're missing an answer to "Any issues that the Incubator PMC (IPMC) or ASF 
Board wish/need to be aware of?" (Or rather I think markup is hiding your 
answer.)
- Felix is not listed as a mentor of your project in the roster [1]
- Make sure that each line under the ### headings starts with two spaces

Thanks,
Justin


1. https://whimsy.apache.org/roster/ppmc/crail


Re: Should python-2 be supported in Spark 3.0?

2019-05-31 Thread Felix Cheung
Very subtle but someone might take

“We will drop Python 2 support in a future release in 2020”

To mean any / first release in 2020. Whereas the next statement indicates patch 
release is not included in above. Might help reorder the items or clarify the 
wording.



From: shane knapp 
Sent: Friday, May 31, 2019 7:38:10 PM
To: Denny Lee
Cc: Holden Karau; Bryan Cutler; Erik Erlandson; Felix Cheung; Mark Hamstra; 
Matei Zaharia; Reynold Xin; Sean Owen; Wenchen Fen; Xiangrui Meng; dev; user
Subject: Re: Should python-2 be supported in Spark 3.0?

+1000  ;)

On Sat, Jun 1, 2019 at 6:53 AM Denny Lee 
mailto:denny.g@gmail.com>> wrote:
+1

On Fri, May 31, 2019 at 17:58 Holden Karau 
mailto:hol...@pigscanfly.ca>> wrote:
+1

On Fri, May 31, 2019 at 5:41 PM Bryan Cutler 
mailto:cutl...@gmail.com>> wrote:
+1 and the draft sounds good

On Thu, May 30, 2019, 11:32 AM Xiangrui Meng 
mailto:men...@gmail.com>> wrote:
Here is the draft announcement:

===
Plan for dropping Python 2 support

As many of you already knew, Python core development team and many utilized 
Python packages like Pandas and NumPy will drop Python 2 support in or before 
2020/01/01. Apache Spark has supported both Python 2 and 3 since Spark 1.4 
release in 2015. However, maintaining Python 2/3 compatibility is an increasing 
burden and it essentially limits the use of Python 3 features in Spark. Given 
the end of life (EOL) of Python 2 is coming, we plan to eventually drop Python 
2 support as well. The current plan is as follows:

* In the next major release in 2019, we will deprecate Python 2 support. 
PySpark users will see a deprecation warning if Python 2 is used. We will 
publish a migration guide for PySpark users to migrate to Python 3.
* We will drop Python 2 support in a future release in 2020, after Python 2 EOL 
on 2020/01/01. PySpark users will see an error if Python 2 is used.
* For releases that support Python 2, e.g., Spark 2.4, their patch releases 
will continue supporting Python 2. However, after Python 2 EOL, we might not 
take patches that are specific to Python 2.
===

Sean helped make a pass. If it looks good, I'm going to upload it to Spark 
website and announce it here. Let me know if you think we should do a VOTE 
instead.

On Thu, May 30, 2019 at 9:21 AM Xiangrui Meng 
mailto:men...@gmail.com>> wrote:
I created https://issues.apache.org/jira/browse/SPARK-27884 to track the work.

On Thu, May 30, 2019 at 2:18 AM Felix Cheung 
mailto:felixcheun...@hotmail.com>> wrote:
We don’t usually reference a future release on website

> Spark website and state that Python 2 is deprecated in Spark 3.0

I suspect people will then ask when is Spark 3.0 coming out then. Might need to 
provide some clarity on that.

We can say the "next major release in 2019" instead of Spark 3.0. Spark 3.0 
timeline certainly requires a new thread to discuss.




From: Reynold Xin mailto:r...@databricks.com>>
Sent: Thursday, May 30, 2019 12:59:14 AM
To: shane knapp
Cc: Erik Erlandson; Mark Hamstra; Matei Zaharia; Sean Owen; Wenchen Fen; 
Xiangrui Meng; dev; user
Subject: Re: Should python-2 be supported in Spark 3.0?

+1 on Xiangrui’s plan.

On Thu, May 30, 2019 at 7:55 AM shane knapp 
mailto:skn...@berkeley.edu>> wrote:
I don't have a good sense of the overhead of continuing to support
Python 2; is it large enough to consider dropping it in Spark 3.0?

from the build/test side, it will actually be pretty easy to continue support 
for python2.7 for spark 2.x as the feature sets won't be expanding.

that being said, i will be cracking a bottle of champagne when i can delete all 
of the ansible and anaconda configs for python2.x.  :)

On the development side, in a future release that drops Python 2 support we can 
remove code that maintains python 2/3 compatibility and start using python 3 
only features, which is also quite exciting.


shane
--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


--
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
<https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau


--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: Should python-2 be supported in Spark 3.0?

2019-05-31 Thread Felix Cheung
Very subtle but someone might take

“We will drop Python 2 support in a future release in 2020”

To mean any / first release in 2020. Whereas the next statement indicates patch 
release is not included in above. Might help reorder the items or clarify the 
wording.



From: shane knapp 
Sent: Friday, May 31, 2019 7:38:10 PM
To: Denny Lee
Cc: Holden Karau; Bryan Cutler; Erik Erlandson; Felix Cheung; Mark Hamstra; 
Matei Zaharia; Reynold Xin; Sean Owen; Wenchen Fen; Xiangrui Meng; dev; user
Subject: Re: Should python-2 be supported in Spark 3.0?

+1000  ;)

On Sat, Jun 1, 2019 at 6:53 AM Denny Lee 
mailto:denny.g@gmail.com>> wrote:
+1

On Fri, May 31, 2019 at 17:58 Holden Karau 
mailto:hol...@pigscanfly.ca>> wrote:
+1

On Fri, May 31, 2019 at 5:41 PM Bryan Cutler 
mailto:cutl...@gmail.com>> wrote:
+1 and the draft sounds good

On Thu, May 30, 2019, 11:32 AM Xiangrui Meng 
mailto:men...@gmail.com>> wrote:
Here is the draft announcement:

===
Plan for dropping Python 2 support

As many of you already knew, Python core development team and many utilized 
Python packages like Pandas and NumPy will drop Python 2 support in or before 
2020/01/01. Apache Spark has supported both Python 2 and 3 since Spark 1.4 
release in 2015. However, maintaining Python 2/3 compatibility is an increasing 
burden and it essentially limits the use of Python 3 features in Spark. Given 
the end of life (EOL) of Python 2 is coming, we plan to eventually drop Python 
2 support as well. The current plan is as follows:

* In the next major release in 2019, we will deprecate Python 2 support. 
PySpark users will see a deprecation warning if Python 2 is used. We will 
publish a migration guide for PySpark users to migrate to Python 3.
* We will drop Python 2 support in a future release in 2020, after Python 2 EOL 
on 2020/01/01. PySpark users will see an error if Python 2 is used.
* For releases that support Python 2, e.g., Spark 2.4, their patch releases 
will continue supporting Python 2. However, after Python 2 EOL, we might not 
take patches that are specific to Python 2.
===

Sean helped make a pass. If it looks good, I'm going to upload it to Spark 
website and announce it here. Let me know if you think we should do a VOTE 
instead.

On Thu, May 30, 2019 at 9:21 AM Xiangrui Meng 
mailto:men...@gmail.com>> wrote:
I created https://issues.apache.org/jira/browse/SPARK-27884 to track the work.

On Thu, May 30, 2019 at 2:18 AM Felix Cheung 
mailto:felixcheun...@hotmail.com>> wrote:
We don’t usually reference a future release on website

> Spark website and state that Python 2 is deprecated in Spark 3.0

I suspect people will then ask when is Spark 3.0 coming out then. Might need to 
provide some clarity on that.

We can say the "next major release in 2019" instead of Spark 3.0. Spark 3.0 
timeline certainly requires a new thread to discuss.




From: Reynold Xin mailto:r...@databricks.com>>
Sent: Thursday, May 30, 2019 12:59:14 AM
To: shane knapp
Cc: Erik Erlandson; Mark Hamstra; Matei Zaharia; Sean Owen; Wenchen Fen; 
Xiangrui Meng; dev; user
Subject: Re: Should python-2 be supported in Spark 3.0?

+1 on Xiangrui’s plan.

On Thu, May 30, 2019 at 7:55 AM shane knapp 
mailto:skn...@berkeley.edu>> wrote:
I don't have a good sense of the overhead of continuing to support
Python 2; is it large enough to consider dropping it in Spark 3.0?

from the build/test side, it will actually be pretty easy to continue support 
for python2.7 for spark 2.x as the feature sets won't be expanding.

that being said, i will be cracking a bottle of champagne when i can delete all 
of the ansible and anaconda configs for python2.x.  :)

On the development side, in a future release that drops Python 2 support we can 
remove code that maintains python 2/3 compatibility and start using python 3 
only features, which is also quite exciting.


shane
--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


--
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 
<https://amzn.to/2MaRAG9>
YouTube Live Streams: https://www.youtube.com/user/holdenkarau


--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: Zeppelin log files in windows

2019-05-31 Thread Felix Cheung
Thanks for the report!

Other projects have testing setup on the like of AppVeyor. We would greatly 
appreciate any contribution in this area to fix the issue and enable continuous 
testing coverage!


From: Jeff Zhang 
Sent: Tuesday, May 28, 2019 6:26:43 AM
To: users
Subject: Re: Zeppelin log files in windows

Hi Ravi,

Sorry for the inconvenience. The community has no bandwidth to keep the 
stability of zeppelin on windows. I would recommend you to install zeppelin in 
linux if that works for you.


Ravi Pullareddy 
mailto:ravi.pullare...@minlog.com.au>> 
于2019年5月28日周二 上午9:24写道:
Hi Folks

Windows version of Zeppelin 0.8.1 has a typo error on line 76 of command.cmd. 
There is a curly brace ‘}’ in place of closed bracket ‘)’ . I request you to 
correct this and publish it. Apart from this trivial error,  Zeppelin logs to 
console but does not create a log file in Windows.  My log4j.properties file is 
as below. Please check and let me know if I am missing something.

log4j.rootLogger = INFO, stdout, dailyfile

log4j.appender.stdout = org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout = org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%5p [%d] ({%t} %F[%M]:%L) - %m%n

log4j.appender.dailyfile.DatePattern=.-MM-dd
log4j.appender.dailyfile.Threshold = INFO
log4j.appender.dailyfile = org.apache.log4j.DailyRollingFileAppender
#log4j.appender.dailyfile.File = ${zeppelin.log.file}
log4j.appender.dailyfile.File = D:\\zeppelin-0.8.1-bin-all\\logs\\zep.log
log4j.appender.dailyfile.layout = org.apache.log4j.PatternLayout
log4j.appender.dailyfile.layout.ConversionPattern=%5p [%d] ({%t} %F[%M]:%L) - 
%m%n

Thanks
Ravi


--
Best Regards

Jeff Zhang


Re: Should python-2 be supported in Spark 3.0?

2019-05-30 Thread Felix Cheung
We don’t usually reference a future release on website

> Spark website and state that Python 2 is deprecated in Spark 3.0

I suspect people will then ask when is Spark 3.0 coming out then. Might need to 
provide some clarity on that.



From: Reynold Xin 
Sent: Thursday, May 30, 2019 12:59:14 AM
To: shane knapp
Cc: Erik Erlandson; Mark Hamstra; Matei Zaharia; Sean Owen; Wenchen Fen; 
Xiangrui Meng; dev; user
Subject: Re: Should python-2 be supported in Spark 3.0?

+1 on Xiangrui’s plan.

On Thu, May 30, 2019 at 7:55 AM shane knapp 
mailto:skn...@berkeley.edu>> wrote:
I don't have a good sense of the overhead of continuing to support
Python 2; is it large enough to consider dropping it in Spark 3.0?

from the build/test side, it will actually be pretty easy to continue support 
for python2.7 for spark 2.x as the feature sets won't be expanding.

that being said, i will be cracking a bottle of champagne when i can delete all 
of the ansible and anaconda configs for python2.x.  :)

shane
--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: Should python-2 be supported in Spark 3.0?

2019-05-30 Thread Felix Cheung
We don’t usually reference a future release on website

> Spark website and state that Python 2 is deprecated in Spark 3.0

I suspect people will then ask when is Spark 3.0 coming out then. Might need to 
provide some clarity on that.



From: Reynold Xin 
Sent: Thursday, May 30, 2019 12:59:14 AM
To: shane knapp
Cc: Erik Erlandson; Mark Hamstra; Matei Zaharia; Sean Owen; Wenchen Fen; 
Xiangrui Meng; dev; user
Subject: Re: Should python-2 be supported in Spark 3.0?

+1 on Xiangrui’s plan.

On Thu, May 30, 2019 at 7:55 AM shane knapp 
mailto:skn...@berkeley.edu>> wrote:
I don't have a good sense of the overhead of continuing to support
Python 2; is it large enough to consider dropping it in Spark 3.0?

from the build/test side, it will actually be pretty easy to continue support 
for python2.7 for spark 2.x as the feature sets won't be expanding.

that being said, i will be cracking a bottle of champagne when i can delete all 
of the ansible and anaconda configs for python2.x.  :)

shane
--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar Processing Support

2019-05-27 Thread Felix Cheung
+1

I’d prefer to see more of the end goal and how that could be achieved (such as 
ETL or SPARK-24579). However given the rounds and months of discussions we have 
come down to just the public API.

If the community thinks a new set of public API is maintainable, I don’t see 
any problem with that.


From: Tom Graves 
Sent: Sunday, May 26, 2019 8:22:59 AM
To: hol...@pigscanfly.ca; Reynold Xin
Cc: Bobby Evans; DB Tsai; Dongjoon Hyun; Imran Rashid; Jason Lowe; Matei 
Zaharia; Thomas graves; Xiangrui Meng; Xiangrui Meng; dev
Subject: Re: [VOTE][SPARK-27396] SPIP: Public APIs for extended Columnar 
Processing Support

More feedback would be great, this has been open a long time though, let's 
extend til Wednesday the 29th and see where we are at.

Tom



Sent from Yahoo Mail on 
Android

On Sat, May 25, 2019 at 6:28 PM, Holden Karau
 wrote:
Same I meant to catch up after kubecon but had some unexpected travels.

On Sat, May 25, 2019 at 10:56 PM Reynold Xin 
mailto:r...@databricks.com>> wrote:
Can we push this to June 1st? I have been meaning to read it but unfortunately 
keeps traveling...

On Sat, May 25, 2019 at 8:31 PM Dongjoon Hyun 
mailto:dongjoon.h...@gmail.com>> wrote:
+1

Thanks,
Dongjoon.

On Fri, May 24, 2019 at 17:03 DB Tsai  wrote:
+1 on exposing the APIs for columnar processing support.

I understand that the scope of this SPIP doesn't cover AI / ML
use-cases. But I saw a good performance gain when I converted data
from rows to columns to leverage on SIMD architectures in a POC ML
application.

With the exposed columnar processing support, I can imagine that the
heavy lifting parts of ML applications (such as computing the
objective functions) can be written as columnar expressions that
leverage on SIMD architectures to get a good speedup.

Sincerely,

DB Tsai
--
Web: https://www.dbtsai.com
PGP Key ID: 42E5B25A8F7A82C1

On Wed, May 15, 2019 at 2:59 PM Bobby Evans 
mailto:reva...@gmail.com>> wrote:
>
> It would allow for the columnar processing to be extended through the 
> shuffle.  So if I were doing say an FPGA accelerated extension it could 
> replace the ShuffleExechangeExec with one that can take a ColumnarBatch as 
> input instead of a Row. The extended version of the ShuffleExchangeExec could 
> then do the partitioning on the incoming batch and instead of producing a 
> ShuffleRowRDD for the exchange they could produce something like a 
> ShuffleBatchRDD that would let the serializing and deserializing happen in a 
> column based format for a faster exchange, assuming that columnar processing 
> is also happening after the exchange. This is just like providing a columnar 
> version of any other catalyst operator, except in this case it is a bit more 
> complex of an operator.
>
> On Wed, May 15, 2019 at 12:15 PM Imran Rashid  
> wrote:
>>
>> sorry I am late to the discussion here -- the jira mentions using this 
>> extensions for dealing with shuffles, can you explain that part?  I don't 
>> see how you would use this to change shuffle behavior at all.
>>
>> On Tue, May 14, 2019 at 10:59 AM Thomas graves 
>> mailto:tgra...@apache.org>> wrote:
>>>
>>> Thanks for replying, I'll extend the vote til May 26th to allow your
>>> and other people feedback who haven't had time to look at it.
>>>
>>> Tom
>>>
>>> On Mon, May 13, 2019 at 4:43 PM Holden Karau 
>>> mailto:hol...@pigscanfly.ca>> wrote:
>>> >
>>> > I’d like to ask this vote period to be extended, I’m interested but I 
>>> > don’t have the cycles to review it in detail and make an informed vote 
>>> > until the 25th.
>>> >
>>> > On Tue, May 14, 2019 at 1:49 AM Xiangrui Meng 
>>> > mailto:m...@databricks.com>> wrote:
>>> >>
>>> >> My vote is 0. Since the updated SPIP focuses on ETL use cases, I don't 
>>> >> feel strongly about it. I would still suggest doing the following:
>>> >>
>>> >> 1. Link the POC mentioned in Q4. So people can verify the POC result.
>>> >> 2. List public APIs we plan to expose in Appendix A. I did a quick 
>>> >> check. Beside ColumnarBatch and ColumnarVector, we also need to make the 
>>> >> following public. People who are familiar with SQL internals should help 
>>> >> assess the risk.
>>> >> * ColumnarArray
>>> >> * ColumnarMap
>>> >> * unsafe.types.CaledarInterval
>>> >> * ColumnarRow
>>> >> * UTF8String
>>> >> * ArrayData
>>> >> * ...
>>> >> 3. I still feel using Pandas UDF as the mid-term success doesn't match 
>>> >> the purpose of this SPIP. It does make some code cleaner. But I guess 
>>> >> for ETL use cases, it won't bring much value.
>>> >>
>>> > --
>>> > Twitter: https://twitter.com/holdenkarau
>>> > Books (Learning Spark, High Performance Spark, etc.): 
>>> > https://amzn.to/2MaRAG9
>>> > YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>>

[jira] [Commented] (SPARK-21367) R older version of Roxygen2 on Jenkins

2019-05-12 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-21367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838188#comment-16838188
 ] 

Felix Cheung commented on SPARK-21367:
--

great thx!

> R older version of Roxygen2 on Jenkins
> --
>
> Key: SPARK-21367
> URL: https://issues.apache.org/jira/browse/SPARK-21367
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.3.0
>Reporter: Felix Cheung
>Assignee: shane knapp
>Priority: Major
> Attachments: R.paks
>
>
> Getting this message from a recent build.
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79461/console
> Warning messages:
> 1: In check_dep_version(pkg, version, compare) :
>   Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
> 2: In check_dep_version(pkg, version, compare) :
>   Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
> * installing *source* package 'SparkR' ...
> ** R
> We have been running with 5.0.1 and haven't changed for a year.
> NOTE: Roxygen 6.x has some big changes and IMO we should not move to that yet.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27684) Reduce ScalaUDF conversion overheads for primitives

2019-05-12 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838187#comment-16838187
 ] 

Felix Cheung commented on SPARK-27684:
--

definitely could be interesting..

> Reduce ScalaUDF conversion overheads for primitives
> ---
>
> Key: SPARK-27684
> URL: https://issues.apache.org/jira/browse/SPARK-27684
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Josh Rosen
>Priority: Major
>
> I believe that we can reduce ScalaUDF overheads when operating over primitive 
> types.
> In [ScalaUDF's 
> doGenCode|https://github.com/apache/spark/blob/5a8aad01c2aaf0ceef8e9a3cfabbd2e88c8d9f0d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala#L991]
>  we have logic to convert UDF function input types from Catalyst internal 
> types to Scala types (for example, this is used to convert UTF8Strings to 
> Java Strings). Similarly, we convert UDF return types.
> However, UDF input argument conversion is effectively a no-op for primitive 
> types because {{CatalystTypeConverters.createToScalaConverter()}} returns 
> {{identity}} in those cases. UDF result conversion is a little tricker 
> because {{createToCatalystConverter()}} returns [a 
> function|https://github.com/apache/spark/blob/5a8aad01c2aaf0ceef8e9a3cfabbd2e88c8d9f0d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala#L413]
>  that handles {{Option[Primitive]}}, but it might be the case that the 
> Option-boxing is unusable via ScalaUDF (in which case the conversion truly is 
> an {{identity}} no-op).
> These unnecessary no-op conversions could be quite expensive because each 
> call involves an index into the {{references}} array to get the converters, a 
> second index into the converters array to get the correct converter for the 
> nth input argument, and, finally, the converter invocation itself:
> {code:java}
> Object project_arg_0 = false ? null : ((scala.Function1[]) references[1] /* 
> converters */)[0].apply(project_value_3);{code}
> In these cases, I believe that we can reduce lookup / invocation overheads by 
> modifying the ScalaUDF code generation to eliminate the conversion calls for 
> primitives and directly assign the unconverted result, e.g.
> {code:java}
> Object project_arg_0 = false ? null : project_value_3;{code}
> To cleanly handle the case where we have a multi-argument UDF accepting a 
> mixture of primitive and non-primitive types, we might be able to keep the 
> {{converters}} array the same size (so indexes stay the same) but omit the 
> invocation of the converters for the primitive arguments (e.g. {{converters}} 
> is sparse / contains unused entries in case of primitives).
> I spotted this optimization while trying to construct some quick benchmarks 
> to measure UDF invocation overheads. For example:
> {code:java}
> spark.udf.register("identity", (x: Int) => x)
> sql("select id, id * 2, id * 3 from range(1000 * 1000 * 1000)").rdd.count() 
> // ~ 52 seconds
> sql("select identity(id), identity(id * 2), identity(id * 3) from range(1000 
> * 1000 * 1000)").rdd.count() // ~84 seconds{code}
> I'm curious to see whether the optimization suggested here can close this 
> performance gap. It'd also be a good idea to construct more principled 
> microbenchmarks covering multi-argument UDFs, projections involving multiple 
> UDFs over different input and output types, etc.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Fwd: Missing podling logos

2019-05-12 Thread Felix Cheung
Please add the project logo!


-- Forwarded message -
From: Justin Mclean 
Date: Sat, May 11, 2019 at 9:47 PM
Subject: Re: Missing podling logos
To: 
CC: 


Hi,

My second list for the logos  [1] had a few errors as it assumed .svg files
existed for all projects. Try this one instead:

Logos misisng:
amaterasu
annotator
batchee
brpc
datasketches
dlab
flagon
gobblin
hudi
iotdb
marvin
omid
pinot
ratis
rya
s2graph
samoa
sdap
shardingsphere
singa
tamaya
training
tephra
toree
tuweni
tvm
weex
zipkin

Thanks,
Justin

1. http://www.apache.org/logos/
-
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org


Re: Static partitioning in partitionBy()

2019-05-07 Thread Felix Cheung
You could

df.filter(col(“c”) = “c1”).write().partitionBy(“c”).save

It could get some data skew problem but might work for you




From: Burak Yavuz 
Sent: Tuesday, May 7, 2019 9:35:10 AM
To: Shubham Chaurasia
Cc: dev; u...@spark.apache.org
Subject: Re: Static partitioning in partitionBy()

It depends on the data source. Delta Lake (https://delta.io) allows you to do 
it with the .option("replaceWhere", "c = c1"). With other file formats, you can 
write directly into the partition directory (tablePath/c=c1), but you lose 
atomicity.

On Tue, May 7, 2019, 6:36 AM Shubham Chaurasia 
mailto:shubh.chaura...@gmail.com>> wrote:
Hi All,

Is there a way I can provide static partitions in partitionBy()?

Like:
df.write.mode("overwrite").format("MyDataSource").partitionBy("c=c1").save

Above code gives following error as it tries to find column `c=c1` in df.

org.apache.spark.sql.AnalysisException: Partition column `c=c1` not found in 
schema struct;

Thanks,
Shubham


Re: Static partitioning in partitionBy()

2019-05-07 Thread Felix Cheung
You could

df.filter(col(“c”) = “c1”).write().partitionBy(“c”).save

It could get some data skew problem but might work for you




From: Burak Yavuz 
Sent: Tuesday, May 7, 2019 9:35:10 AM
To: Shubham Chaurasia
Cc: dev; user@spark.apache.org
Subject: Re: Static partitioning in partitionBy()

It depends on the data source. Delta Lake (https://delta.io) allows you to do 
it with the .option("replaceWhere", "c = c1"). With other file formats, you can 
write directly into the partition directory (tablePath/c=c1), but you lose 
atomicity.

On Tue, May 7, 2019, 6:36 AM Shubham Chaurasia 
mailto:shubh.chaura...@gmail.com>> wrote:
Hi All,

Is there a way I can provide static partitions in partitionBy()?

Like:
df.write.mode("overwrite").format("MyDataSource").partitionBy("c=c1").save

Above code gives following error as it tries to find column `c=c1` in df.

org.apache.spark.sql.AnalysisException: Partition column `c=c1` not found in 
schema struct;

Thanks,
Shubham


Re: [VOTE] Release Apache Spark 2.4.3

2019-05-05 Thread Felix Cheung
I ran basic tests on R, r-hub etc. LGTM.

+1 (limited - I didn’t get to run other usual tests)


From: Sean Owen 
Sent: Wednesday, May 1, 2019 2:21 PM
To: Xiao Li
Cc: dev@spark.apache.org
Subject: Re: [VOTE] Release Apache Spark 2.4.3

+1 from me. There is little change from 2.4.2 anyway, except for the
important change to the build script that should build pyspark with
Scala 2.11 jars. I verified that the package contains the _2.11 Spark
jars, but have a look!

I'm still getting this weird error from the Kafka module when testing,
but it's a long-standing weird known issue:

[error] 
/home/ubuntu/spark-2.4.3/external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumerSuite.scala:85:
Symbol 'term org.eclipse' is missing from the classpath.
[error] This symbol is required by 'method
org.apache.spark.metrics.MetricsSystem.getServletHandlers'.
[error] Make sure that term eclipse is in your classpath and check for
conflicting dependencies with `-Ylog-classpath`.
[error] A full rebuild may help if 'MetricsSystem.class' was compiled
against an incompatible version of org.
[error] testUtils.sendMessages(topic, data.toArray)

Killing zinc and rebuilding didn't help.
But this isn't happening in Jenkins for example, so it should be env-specific.

On Wed, May 1, 2019 at 9:39 AM Xiao Li  wrote:
>
> Please vote on releasing the following candidate as Apache Spark version 
> 2.4.3.
>
> The vote is open until May 5th PST and passes if a majority +1 PMC votes are 
> cast, with
> a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 2.4.3
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.4.3-rc1 (commit 
> c3e32bf06c35ba2580d46150923abfa795b4446a):
> https://github.com/apache/spark/tree/v2.4.3-rc1
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.4.3-rc1-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1324/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.4.3-rc1-docs/
>
> The list of bug fixes going into 2.4.2 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12345410
>
> The release is using the release script of the branch 2.4.3-rc1 with the 
> following commit 
> https://github.com/apache/spark/commit/e417168ed012190db66a21e626b2b8d2332d6c01
>
> FAQ
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 2.4.3?
> ===
>
> The current list of open tickets targeted at 2.4.3 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target 
> Version/s" = 2.4.3
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Release Apache Spark 2.4.2

2019-05-01 Thread Felix Cheung
Just my 2c

If there is a known security issue, we should fix it rather waiting for if it 
actually could be might be affecting Spark to be found by a black hat, or worse.

I don’t think any of us want to see Spark in the news for this reason.

From: Sean Owen 
Sent: Tuesday, April 30, 2019 1:52:53 PM
To: Reynold Xin
Cc: Jungtaek Lim; Dongjoon Hyun; Wenchen Fan; Michael Heuer; Terry Kim; dev; 
Xiao Li
Subject: Re: [VOTE] Release Apache Spark 2.4.2

FWIW I'm OK with this even though I proposed the backport PR for discussion. It 
really is a tough call, balancing the potential but as-yet unclear security 
benefit vs minor but real Jackson deserialization behavior change.

Because we have a pressing need for a 2.4.3 release (really a 2.4.2.1 almost) I 
think it's reasonable to defer a final call on this in 2.4.x and revert for 
now. Leaving it in 2.4.3 makes it quite permanent.

A little more color on the discussion:
- I don't think https://github.com/apache/spark/pull/22071 mitigates the 
theoretical problem here; I would guess the attack vector is deserializing a 
malicious JSON file. This is unproven either way
- The behavior change we know is basically what you see in the revert PR: 
entries like "'foo': null" aren't written by Jackson by default in 2.7+. You 
can make them so but it needs a code tweak in any app that inherits Spark's 
Jackson
- This is not related to Scala version

This is for a discussion about re-including in 2.4.4:
- Does anyone know that the Jackson issues really _could_ affect Spark
- Does anyone have concrete examples of why the behavior change is a bigger 
deal, or not as big a deal, as anticipated?

On Tue, Apr 30, 2019 at 1:34 AM Reynold Xin 
mailto:r...@databricks.com>> wrote:

Echoing both of you ... it's a bit risky to bump dependency versions in a patch 
release, especially for a super common library. (I wish we shaded Jackson).

Maybe the CVE is a sufficient reason to bump the dependency, ignoring the 
potential behavior changes that might happen, but I'd like to see a bit more 
discussions there and have 2.4.3 focusing on fixing the Scala version issue 
first.



On Mon, Apr 29, 2019 at 11:17 PM, Jungtaek Lim 
mailto:kabh...@gmail.com>> wrote:
Ah! Sorry Xiao I should check the fix version of issue (it's 2.4.3/3.0.0).

Then looks much better to revert and avoid dependency conflict in bugfix 
release. Jackson is one of known things making non-backward changes to 
non-major version, so I agree it's the thing to be careful, or shade/relocate 
and forget about it.

On Tue, Apr 30, 2019 at 3:04 PM Xiao Li 
mailto:lix...@databricks.com>> wrote:
Jungtaek,

Thanks for your inputs! Sorry for the confusion. Let me make it clear.

  *   All the previous 2.4.x [including 2.4.2] releases are using Jackson 
2.6.7.1.
  *   In the master branch, the Jackson is already upgraded to 2.9.8.
  *   Here, I just try to revert Jackson upgrade in the upcoming 2.4.3 release.

Cheers,

Xiao

On Mon, Apr 29, 2019 at 10:53 PM Jungtaek Lim 
mailto:kabh...@gmail.com>> wrote:
Just to be clear, does upgrading jackson to 2.9.8 be coupled with Scala 
version? And could you summarize one of actual broken case due to upgrade if 
you observe anything? Providing actual case would help us to weigh the impact.

Btw, my 2 cents, personally I would rather avoid upgrading dependencies in 
bugfix release unless it resolves major bugs, so reverting it from only 
branch-2.4 sounds good to me. (I still think jackson upgrade is necessary in 
master branch, avoiding lots of CVEs we will waste huge amount of time to 
identify the impact. And other libs will start making couple with jackson 2.9.x 
which conflict Spark's jackson dependency.)

If there will be a consensus regarding reverting that, we may also need to 
announce Spark 2.4.2 is discouraged to be used, otherwise end users will suffer 
from jackson version back and forth.

Thanks,
Jungtaek Lim (HeartSaVioR)

On Tue, Apr 30, 2019 at 2:30 PM Xiao Li 
mailto:lix...@databricks.com>> wrote:
Before cutting 2.4.3, I just submitted a PR 
https://github.com/apache/spark/pull/24493 for reverting the commit 
https://github.com/apache/spark/commit/6f394a20bf49f67b4d6329a1c25171c8024a2fae.

In general, we need to be very cautious about the Jackson upgrade in the patch 
releases, especially when this upgrade could break the existing behaviors of 
the external packages or data sources, and generate different results after the 
upgrade. The external packages and data sources need to change their source 
code to keep the original behaviors. The upgrade requires more discussions 
before releasing it, I think.

In the previous PR https://github.com/apache/spark/pull/22071, we turned off 
`spark.master.rest.enabled` by default and 
added the following claim in our security doc:
The Rest Submission Server and the MesosClusterDispatcher do not support 
authentication.  You should ensure that all network access to the 

Re: [VOTE] Release Apache Spark 2.4.2

2019-04-21 Thread Felix Cheung
+1

R tests, package tests on r-hub. Manually check commits under R, doc etc



From: Sean Owen 
Sent: Saturday, April 20, 2019 11:27 AM
To: Wenchen Fan
Cc: Spark dev list
Subject: Re: [VOTE] Release Apache Spark 2.4.2

+1 from me too.

It seems like there is support for merging the Jackson change into
2.4.x (and, I think, a few more minor dependency updates) but this
doesn't have to go into 2.4.2. That said, if there is another RC for
any reason, I think we could include it. Otherwise can wait for 2.4.3.

On Thu, Apr 18, 2019 at 9:51 PM Wenchen Fan  wrote:
>
> Please vote on releasing the following candidate as Apache Spark version 
> 2.4.2.
>
> The vote is open until April 23 PST and passes if a majority +1 PMC votes are 
> cast, with
> a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 2.4.2
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.4.2-rc1 (commit 
> a44880ba74caab7a987128cb09c4bee41617770a):
> https://github.com/apache/spark/tree/v2.4.2-rc1
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.4.2-rc1-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1322/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.4.2-rc1-docs/
>
> The list of bug fixes going into 2.4.1 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12344996
>
> FAQ
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 2.4.2?
> ===
>
> The current list of open tickets targeted at 2.4.2 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target 
> Version/s" = 2.4.2
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Spark 2.4.2

2019-04-18 Thread Felix Cheung
Re shading - same argument I’ve made earlier today in a PR...

(Context- in many cases Spark has light or indirect dependencies but bringing 
them into the process breaks users code easily)



From: Michael Heuer 
Sent: Thursday, April 18, 2019 6:41 AM
To: Reynold Xin
Cc: Sean Owen; Michael Armbrust; Ryan Blue; Spark Dev List; Wenchen Fan; Xiao Li
Subject: Re: Spark 2.4.2

+100


On Apr 18, 2019, at 1:48 AM, Reynold Xin 
mailto:r...@databricks.com>> wrote:

We should have shaded all Spark’s dependencies :(

On Wed, Apr 17, 2019 at 11:47 PM Sean Owen 
mailto:sro...@gmail.com>> wrote:
For users that would inherit Jackson and use it directly, or whose
dependencies do. Spark itself (with modifications) should be OK with
the change.
It's risky and normally wouldn't backport, except that I've heard a
few times about concerns about CVEs affecting Databind, so wondering
who else out there might have an opinion. I'm not pushing for it
necessarily.

On Wed, Apr 17, 2019 at 6:18 PM Reynold Xin 
mailto:r...@databricks.com>> wrote:
>
> For Jackson - are you worrying about JSON parsing for users or internal Spark 
> functionality breaking?
>
> On Wed, Apr 17, 2019 at 6:02 PM Sean Owen 
> mailto:sro...@gmail.com>> wrote:
>>
>> There's only one other item on my radar, which is considering updating
>> Jackson to 2.9 in branch-2.4 to get security fixes. Pros: it's come up
>> a few times now that there are a number of CVEs open for 2.6.7. Cons:
>> not clear they affect Spark, and Jackson 2.6->2.9 does change Jackson
>> behavior non-trivially. That said back-porting the update PR to 2.4
>> worked out OK locally. Any strong opinions on this one?
>>
>> On Wed, Apr 17, 2019 at 7:49 PM Wenchen Fan 
>> mailto:cloud0...@gmail.com>> wrote:
>> >
>> > I volunteer to be the release manager for 2.4.2, as I was also going to 
>> > propose 2.4.2 because of the reverting of SPARK-25250. Is there any other 
>> > ongoing bug fixes we want to include in 2.4.2? If no I'd like to start the 
>> > release process today (CST).
>> >
>> > Thanks,
>> > Wenchen
>> >
>> > On Thu, Apr 18, 2019 at 3:44 AM Sean Owen 
>> > mailto:sro...@gmail.com>> wrote:
>> >>
>> >> I think the 'only backport bug fixes to branches' principle remains 
>> >> sound. But what's a bug fix? Something that changes behavior to match 
>> >> what is explicitly supposed to happen, or implicitly supposed to happen 
>> >> -- implied by what other similar things do, by reasonable user 
>> >> expectations, or simply how it worked previously.
>> >>
>> >> Is this a bug fix? I guess the criteria that matches is that behavior 
>> >> doesn't match reasonable user expectations? I don't know enough to have a 
>> >> strong opinion. I also don't think there is currently an objection to 
>> >> backporting it, whatever it's called.
>> >>
>> >>
>> >> Is the question whether this needs a new release? There's no harm in 
>> >> another point release, other than needing a volunteer release manager. 
>> >> One could say, wait a bit longer to see what more info comes in about 
>> >> 2.4.1. But given that 2.4.1 took like 2 months, it's reasonable to move 
>> >> towards a release cycle again. I don't see objection to that either (?)
>> >>
>> >>
>> >> The meta question remains: is a 'bug fix' definition even agreed, and 
>> >> being consistently applied? There aren't correct answers, only best 
>> >> guesses from each person's own experience, judgment and priorities. These 
>> >> can differ even when applied in good faith.
>> >>
>> >> Sometimes the variance of opinion comes because people have different 
>> >> info that needs to be surfaced. Here, maybe it's best to share what about 
>> >> that offline conversation was convincing, for example.
>> >>
>> >> I'd say it's also important to separate what one would prefer from what 
>> >> one can't live with(out). Assuming one trusts the intent and experience 
>> >> of the handful of others with an opinion, I'd defer to someone who wants 
>> >> X and will own it, even if I'm moderately against it. Otherwise we'd get 
>> >> little done.
>> >>
>> >> In that light, it seems like both of the PRs at issue here are not 
>> >> _wrong_ to backport. This is a good pair that highlights why, when there 
>> >> isn't a clear reason to do / not do something (e.g. obvious errors, 
>> >> breaking public APIs) we give benefit-of-the-doubt in order to get it 
>> >> later.
>> >>
>> >>
>> >> On Wed, Apr 17, 2019 at 12:09 PM Ryan Blue 
>> >> mailto:rb...@netflix.com.invalid>> wrote:
>> >>>
>> >>> Sorry, I should be more clear about what I'm trying to say here.
>> >>>
>> >>> In the past, Xiao has taken the opposite stance. A good example is PR 
>> >>> #21060 that was a very similar situation: behavior didn't match what was 
>> >>> expected and there was low risk. There was a long argument and the patch 
>> >>> didn't make it into 2.3 (to my knowledge).
>> >>>
>> >>> What we call these low-risk behavior fixes doesn't matter. I called it a 
>> >>> bug on 

Re: ApacheCon NA 2019 Call For Proposal and help promoting Spark project

2019-04-14 Thread Felix Cheung
And a plug for the Graph Processing track -

A discussion of comparison talk between the various Spark options (GraphX, 
GraphFrames, CAPS), or the ongoing work with SPARK-25994 Property Graphs, 
Cypher Queries, and Algorithms

Would be great!




From: Felix Cheung 
Sent: Saturday, April 13, 2019 9:51 AM
To: Spark Dev List; user@spark.apache.org
Subject: ApacheCon NA 2019 Call For Proposal and help promoting Spark project

Hi Spark community!

As you know ApacheCon NA 2019 is coming this Sept and it’s CFP is now open! 
This is an important milestone as we celebrate 20 years of ASF. We have tracks 
like Big Data and Machine Learning among many others. Please submit your 
talks/thoughts/challenges/learnings here:
https://www.apachecon.com/acna19/cfp.html

Second, as a community I think it’d be great if we have a post on 
http://spark.apache.org/ website to promote this event also. We already have a 
logo link up and perhaps we could add a post to talk about:
What is the Spark project, what might you learn, then a few suggestions of talk 
topics, why speak at the ApacheCon etc. This will then be linked to the 
ApacheCon official website. Any volunteer from the community?

Third, Twitter. I’m not sure who has access to the ApacheSpark Twitter account 
but it’d be great to promote this. Use the hashtags #ApacheCon and #ACNA19. 
Mention @Apachecon. Please use
https://www.apachecon.com/acna19/cfp.html to promote the CFP, and
https://www.apachecon.com/acna19 to promote the event as a whole.



Re: Dataset schema incompatibility bug when reading column partitioned data

2019-04-13 Thread Felix Cheung
I kinda agree it is confusing when a parameter is not used...


From: Ryan Blue 
Sent: Thursday, April 11, 2019 11:07:25 AM
To: Bruce Robbins
Cc: Dávid Szakállas; Spark Dev List
Subject: Re: Dataset schema incompatibility bug when reading column partitioned 
data


I think the confusion is that the schema passed to spark.read is not a 
projection schema. I don’t think it is even used in this case because the 
Parquet dataset has its own schema. You’re getting the schema of the table. I 
think the correct behavior is to reject a user-specified schema in this case.

On Thu, Apr 11, 2019 at 11:04 AM Bruce Robbins 
mailto:bersprock...@gmail.com>> wrote:
I see a Jira:

https://issues.apache.org/jira/browse/SPARK-21021

On Thu, Apr 11, 2019 at 9:08 AM Dávid Szakállas 
mailto:david.szakal...@gmail.com>> wrote:
+dev for more visibility. Is this a known issue? Is there a plan for a fix?

Thanks,
David

Begin forwarded message:

From: Dávid Szakállas 
mailto:david.szakal...@gmail.com>>
Subject: Dataset schema incompatibility bug when reading column partitioned data
Date: 2019. March 29. 14:15:27 CET
To: u...@spark.apache.org

We observed the following bug on Spark 2.4.0:


scala> 
spark.createDataset(Seq((1,2))).write.partitionBy("_1").parquet("foo.parquet")

scala> val schema = StructType(Seq(StructField("_1", 
IntegerType),StructField("_2", IntegerType)))

scala> spark.read.schema(schema).parquet("foo.parquet").as[(Int, Int)].show
+---+---+
| _2| _1|
+---+---+
|  2|  1|
+---+- --+

That is, when reading column partitioned Parquet files the explicitly specified 
schema is not adhered to, instead the partitioning columns are appended the end 
of the column list. This is a quite severe issue as some operations, such as 
union, fails if columns are in a different order in two datasets. Thus we have 
to work around the issue with a select:

val columnNames = schema.fields.map(_.name)
ds.select(columnNames.head, columnNames.tail: _*)


Thanks,
David Szakallas
Data Engineer | Whitepages, Inc.



--
Ryan Blue
Software Engineer
Netflix


ApacheCon NA 2019 Call For Proposal and help promoting Spark project

2019-04-13 Thread Felix Cheung
Hi Spark community!

As you know ApacheCon NA 2019 is coming this Sept and it’s CFP is now open! 
This is an important milestone as we celebrate 20 years of ASF. We have tracks 
like Big Data and Machine Learning among many others. Please submit your 
talks/thoughts/challenges/learnings here:
https://www.apachecon.com/acna19/cfp.html

Second, as a community I think it’d be great if we have a post on 
http://spark.apache.org/ website to promote this event also. We already have a 
logo link up and perhaps we could add a post to talk about:
What is the Spark project, what might you learn, then a few suggestions of talk 
topics, why speak at the ApacheCon etc. This will then be linked to the 
ApacheCon official website. Any volunteer from the community?

Third, Twitter. I’m not sure who has access to the ApacheSpark Twitter account 
but it’d be great to promote this. Use the hashtags #ApacheCon and #ACNA19. 
Mention @Apachecon. Please use
https://www.apachecon.com/acna19/cfp.html to promote the CFP, and
https://www.apachecon.com/acna19 to promote the event as a whole.



ApacheCon NA 2019 Call For Proposal and help promoting Spark project

2019-04-13 Thread Felix Cheung
Hi Spark community!

As you know ApacheCon NA 2019 is coming this Sept and it’s CFP is now open! 
This is an important milestone as we celebrate 20 years of ASF. We have tracks 
like Big Data and Machine Learning among many others. Please submit your 
talks/thoughts/challenges/learnings here:
https://www.apachecon.com/acna19/cfp.html

Second, as a community I think it’d be great if we have a post on 
http://spark.apache.org/ website to promote this event also. We already have a 
logo link up and perhaps we could add a post to talk about:
What is the Spark project, what might you learn, then a few suggestions of talk 
topics, why speak at the ApacheCon etc. This will then be linked to the 
ApacheCon official website. Any volunteer from the community?

Third, Twitter. I’m not sure who has access to the ApacheSpark Twitter account 
but it’d be great to promote this. Use the hashtags #ApacheCon and #ACNA19. 
Mention @Apachecon. Please use
https://www.apachecon.com/acna19/cfp.html to promote the CFP, and
https://www.apachecon.com/acna19 to promote the event as a whole.



[jira] [Updated] (SPARK-21805) disable R vignettes code on Windows

2019-04-06 Thread Felix Cheung (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-21805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-21805:
-
Issue Type: Sub-task  (was: Bug)
Parent: SPARK-15799

> disable R vignettes code on Windows
> ---
>
> Key: SPARK-21805
> URL: https://issues.apache.org/jira/browse/SPARK-21805
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 2.2.0, 2.3.0
>Reporter: Felix Cheung
>    Assignee: Felix Cheung
>Priority: Major
> Fix For: 2.2.1, 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-22344) Prevent R CMD check from using /tmp

2019-04-06 Thread Felix Cheung (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-22344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-22344:
-
Issue Type: Sub-task  (was: Bug)
Parent: SPARK-15799

> Prevent R CMD check from using /tmp
> ---
>
> Key: SPARK-22344
> URL: https://issues.apache.org/jira/browse/SPARK-22344
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 1.6.3, 2.1.2, 2.2.0, 2.3.0
>Reporter: Shivaram Venkataraman
>Assignee: Shivaram Venkataraman
>Priority: Major
> Fix For: 2.2.1, 2.3.0
>
>
> When R CMD check is run on the SparkR package it leaves behind files in /tmp 
> which is a violation of CRAN policy. We should instead write to Rtmpdir. 
> Notes from CRAN are below
> {code}
> Checking this leaves behind dirs
>hive/$USER
>$USER
> and files named like
>b4f6459b-0624-4100-8358-7aa7afbda757_resources
> in /tmp, in violation of the CRAN Policy.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24535) Fix java version parsing in SparkR on Windows

2019-04-06 Thread Felix Cheung (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-24535:
-
Issue Type: Sub-task  (was: Bug)
Parent: SPARK-15799

> Fix java version parsing in SparkR on Windows
> -
>
> Key: SPARK-24535
> URL: https://issues.apache.org/jira/browse/SPARK-24535
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 2.3.1, 2.4.0
>Reporter: Shivaram Venkataraman
>    Assignee: Felix Cheung
>Priority: Blocker
> Fix For: 2.3.2, 2.4.0
>
>
> We see errors on CRAN of the form 
> {code:java}
>   java version "1.8.0_144"
>   Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
>   Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)
>   Picked up _JAVA_OPTIONS: -XX:-UsePerfData 
>   -- 1. Error: create DataFrame from list or data.frame (@test_basic.R#21)  
> --
>   subscript out of bounds
>   1: sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE, 
> sparkConfig = sparkRTestConfig) at 
> D:/temp/RtmpIJ8Cc3/RLIBS_3242c713c3181/SparkR/tests/testthat/test_basic.R:21
>   2: sparkR.sparkContext(master, appName, sparkHome, sparkConfigMap, 
> sparkExecutorEnvMap, 
>  sparkJars, sparkPackages)
>   3: checkJavaVersion()
>   4: strsplit(javaVersionFilter[[1]], "[\"]")
> {code}
> The complete log file is at 
> http://home.apache.org/~shivaram/SparkR_2.3.1_check_results/Windows/00check.log



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25572) SparkR tests failed on CRAN on Java 10

2019-04-06 Thread Felix Cheung (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-25572:
-
Issue Type: Sub-task  (was: Bug)
Parent: SPARK-15799

> SparkR tests failed on CRAN on Java 10
> --
>
> Key: SPARK-25572
> URL: https://issues.apache.org/jira/browse/SPARK-25572
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 2.4.0
>Reporter: Felix Cheung
>    Assignee: Felix Cheung
>Priority: Major
> Fix For: 2.3.3, 2.4.0
>
>
> follow up to SPARK-24255
> from 2.3.2 release we can see that CRAN doesn't seem to respect the system 
> requirements as running tests - we have seen cases where SparkR is run on 
> Java 10, which unfortunately Spark does not start on. For 2.4.x, lets attempt 
> skipping all tests



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26010) SparkR vignette fails on CRAN on Java 11

2019-04-06 Thread Felix Cheung (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-26010:
-
Issue Type: Sub-task  (was: Bug)
Parent: SPARK-15799

> SparkR vignette fails on CRAN on Java 11
> 
>
> Key: SPARK-26010
> URL: https://issues.apache.org/jira/browse/SPARK-26010
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Felix Cheung
>    Assignee: Felix Cheung
>Priority: Major
> Fix For: 2.3.3, 2.4.1, 3.0.0
>
>
> follow up to SPARK-25572
> but for vignettes
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15799) Release SparkR on CRAN

2019-04-06 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-15799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16811654#comment-16811654
 ] 

Felix Cheung commented on SPARK-15799:
--

more fixed for this (did not open JIRA)

[https://github.com/apache/spark/commit/fa0f791d4d9f083a45ab631a2e9f88a6b749e416#diff-e1e1d3d40573127e9ee0480caf1283d6]

[https://github.com/apache/spark/commit/927081dd959217ed6bf014557db20026d7e22672#diff-e1e1d3d40573127e9ee0480caf1283d6]

 

> Release SparkR on CRAN
> --
>
> Key: SPARK-15799
> URL: https://issues.apache.org/jira/browse/SPARK-15799
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Reporter: Xiangrui Meng
>Assignee: Shivaram Venkataraman
>Priority: Major
> Fix For: 2.1.2
>
>
> Story: "As an R user, I would like to see SparkR released on CRAN, so I can 
> use SparkR easily in an existing R environment and have other packages built 
> on top of SparkR."
> I made this JIRA with the following questions in mind:
> * Are there known issues that prevent us releasing SparkR on CRAN?
> * Do we want to package Spark jars in the SparkR release?
> * Are there license issues?
> * How does it fit into Spark's release process?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-26910) Re-release SparkR to CRAN

2019-04-06 Thread Felix Cheung (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung resolved SPARK-26910.
--
   Resolution: Fixed
Fix Version/s: 2.4.1

2.4.1. released [https://cran.r-project.org/web/packages/SparkR/index.html]

> Re-release SparkR to CRAN
> -
>
> Key: SPARK-26910
> URL: https://issues.apache.org/jira/browse/SPARK-26910
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Affects Versions: 2.4.0
>Reporter: Michael Chirico
>    Assignee: Felix Cheung
>Priority: Major
> Fix For: 2.4.1
>
>
> The logical successor to https://issues.apache.org/jira/browse/SPARK-15799
> I don't see anything specifically tracking re-release in the Jira list. It 
> would be helpful to have an issue tracking this to refer to as an outsider, 
> as well as to document what the blockers are in case some outside help could 
> be useful.
>  * Is there a plan to re-release SparkR to CRAN?
>  * What are the major blockers to doing so at the moment?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-27389) pyspark test failures w/ "UnknownTimeZoneError: 'US/Pacific-New'"

2019-04-05 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-27389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16811461#comment-16811461
 ] 

Felix Cheung commented on SPARK-27389:
--

maybe a new JDK changes TimeZone?

> pyspark test failures w/ "UnknownTimeZoneError: 'US/Pacific-New'"
> -
>
> Key: SPARK-27389
> URL: https://issues.apache.org/jira/browse/SPARK-27389
> Project: Spark
>  Issue Type: Task
>  Components: jenkins, PySpark
>Affects Versions: 3.0.0
>Reporter: Imran Rashid
>Assignee: shane knapp
>Priority: Major
>
> I've seen a few odd PR build failures w/ an error in pyspark tests about 
> "UnknownTimeZoneError: 'US/Pacific-New'".  eg. 
> https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4688/consoleFull
> A bit of searching tells me that US/Pacific-New probably isn't really 
> supposed to be a timezone at all: 
> https://mm.icann.org/pipermail/tz/2009-February/015448.html
> I'm guessing that this is from some misconfiguration of jenkins.  that said, 
> I can't figure out what is wrong.  There does seem to be a timezone entry for 
> US/Pacific-New in {{/usr/share/zoneinfo/US/Pacific-New}} -- but it seems to 
> be there on every amp-jenkins-worker, so I dunno what that alone would cause 
> this failure sometime.
> [~shaneknapp] I am tentatively calling this a "jenkins" issue, but I might be 
> totally wrong here and it is really a pyspark problem.
> Full Stack trace from the test failure:
> {noformat}
> ==
> ERROR: test_to_pandas (pyspark.sql.tests.test_dataframe.DataFrameTests)
> --
> Traceback (most recent call last):
>   File 
> "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/tests/test_dataframe.py",
>  line 522, in test_to_pandas
> pdf = self._to_pandas()
>   File 
> "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/tests/test_dataframe.py",
>  line 517, in _to_pandas
> return df.toPandas()
>   File 
> "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/dataframe.py",
>  line 2189, in toPandas
> _check_series_convert_timestamps_local_tz(pdf[field.name], timezone)
>   File 
> "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py",
>  line 1891, in _check_series_convert_timestamps_local_tz
> return _check_series_convert_timestamps_localize(s, None, timezone)
>   File 
> "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py",
>  line 1877, in _check_series_convert_timestamps_localize
> lambda ts: ts.tz_localize(from_tz, 
> ambiguous=False).tz_convert(to_tz).tz_localize(None)
>   File "/home/anaconda/lib/python2.7/site-packages/pandas/core/series.py", 
> line 2294, in apply
> mapped = lib.map_infer(values, f, convert=convert_dtype)
>   File "pandas/src/inference.pyx", line 1207, in pandas.lib.map_infer 
> (pandas/lib.c:66124)
>   File 
> "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py",
>  line 1878, in 
> if ts is not pd.NaT else pd.NaT)
>   File "pandas/tslib.pyx", line 649, in pandas.tslib.Timestamp.tz_convert 
> (pandas/tslib.c:13923)
>   File "pandas/tslib.pyx", line 407, in pandas.tslib.Timestamp.__new__ 
> (pandas/tslib.c:10447)
>   File "pandas/tslib.pyx", line 1467, in pandas.tslib.convert_to_tsobject 
> (pandas/tslib.c:27504)
>   File "pandas/tslib.pyx", line 1768, in pandas.tslib.maybe_get_tz 
> (pandas/tslib.c:32362)
>   File "/home/anaconda/lib/python2.7/site-packages/pytz/__init__.py", line 
> 178, in timezone
> raise UnknownTimeZoneError(zone)
> UnknownTimeZoneError: 'US/Pacific-New'
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (LIVY-584) python build needs configparser

2019-04-03 Thread Felix Cheung (JIRA)


 [ 
https://issues.apache.org/jira/browse/LIVY-584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung closed LIVY-584.
-
Resolution: Duplicate

> python build needs configparser
> ---
>
> Key: LIVY-584
> URL: https://issues.apache.org/jira/browse/LIVY-584
> Project: Livy
>  Issue Type: Bug
>Affects Versions: 0.6.0
>    Reporter: Felix Cheung
>Priority: Major
>
> pip install configparser  just to build. it won't build until I manually pip 
> install.
>  
> (run into this in my 2nd environment)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (LIVY-584) python build needs configparser

2019-04-03 Thread Felix Cheung (JIRA)
Felix Cheung created LIVY-584:
-

 Summary: python build needs configparser
 Key: LIVY-584
 URL: https://issues.apache.org/jira/browse/LIVY-584
 Project: Livy
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Felix Cheung


pip install configparser  just to build. it won't build until I manually pip 
install.

 

(run into this in my 2nd environment)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (LIVY-583) python build needs configparser

2019-04-03 Thread Felix Cheung (JIRA)
Felix Cheung created LIVY-583:
-

 Summary: python build needs configparser
 Key: LIVY-583
 URL: https://issues.apache.org/jira/browse/LIVY-583
 Project: Livy
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Felix Cheung


pip install configparser  just to build. it won't build until I manually pip 
install.

 

(run into this in my 2nd environment)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (LIVY-582) python test_create_new_session_without_default_config test fails consistently

2019-04-03 Thread Felix Cheung (JIRA)
Felix Cheung created LIVY-582:
-

 Summary: python test_create_new_session_without_default_config 
test fails consistently
 Key: LIVY-582
 URL: https://issues.apache.org/jira/browse/LIVY-582
 Project: Livy
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Felix Cheung


{code:java}
test_create_new_session_without_default_config 

def test_create_new_session_without_default_config():
> mock_and_validate_create_new_session(False)

src/test/python/livy-tests/client_test.py:105:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
:3: in wrapper
???
src/test/python/livy-tests/client_test.py:48: in 
mock_and_validate_create_new_session
load_defaults=defaults)
src/main/python/livy/client.py:88: in __init__
session_conf_dict).json()['id']
src/main/python/livy/client.py:388: in _create_new_session
headers=self._conn._JSON_HEADERS, data=data)
src/main/python/livy/client.py:500: in send_request
json=data, auth=self._spnego_auth())
.eggs/requests-2.21.0-py2.7.egg/requests/api.py:60: in request
return session.request(method=method, url=url, **kwargs)
.eggs/requests-2.21.0-py2.7.egg/requests/sessions.py:533: in request
resp = self.send(prep, **send_kwargs)
.eggs/requests-2.21.0-py2.7.egg/requests/sessions.py:646: in send
r = adapter.send(request, **kwargs)
.eggs/responses-0.10.6-py2.7.egg/responses.py:626: in unbound_on_send
return self._on_request(adapter, request, *a, **kwargs)

self = 
adapter = 
request = 
kwargs = {'cert': None, 'proxies': OrderedDict(), 'stream': False, 'timeout': 
10, ...}
match = None, resp_callback = None
error_msg = "Connection refused by Responses: POST 
http://machine:8998/sessions/ doesn't match Responses Mock"
response = ConnectionError(u"Connection refused by Responses: POST 
http://machine:8998/sessions/doesn't match Responses Mock",)
{code}
Not sure why. this fails 100% and I don't see anything listening to this port. 
Need some help to troubleshoot this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (LIVY-580) Python 3 fails unicode test written for Python 2

2019-04-03 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/LIVY-580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808998#comment-16808998
 ] 

Felix Cheung commented on LIVY-580:
---

thx

> Python 3 fails unicode test written for Python 2
> 
>
> Key: LIVY-580
> URL: https://issues.apache.org/jira/browse/LIVY-580
> Project: Livy
>  Issue Type: Bug
>  Components: Interpreter
>Affects Versions: 0.6
>Reporter: Marcelo Vanzin
>Priority: Major
>
> This is a twofer:
>  
> 1. If "python" defaults to python 3, then the unit tests are running twice 
> against python 3, and not testing python 2.
> 2. In that case, the extra "print unicode" test runs against python 3, 
> instead of the target python 2, and fails, which may be pointing at some 
> problem with Livy's unicode handling on python 3.
>  
> {noformat}
> - should print unicode correctly *** FAILED *** (101 milliseconds)
> ExecuteSuccess(JObject(List((text/plain,JString(☺) did not equal
> ExecuteSuccess(JObject(List((text/plain,JString()
> (PythonInterpreterSpec.scala:272)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: IoTDB supports distributed version

2019-03-31 Thread Felix Cheung
The use case I’m thinking about for time series data is a bit sensitive for 
data loss. Suppose for transaction records.

I think I’d generally agree on A in CAP too. But is it going to be eventual 
consistency? Or it could split brain and lose data?




From: Julian Feinauer 
Sent: Sunday, March 31, 2019 11:34 AM
To: dev@iotdb.apache.org
Subject: Re: IoTDB supports distributed version

Hi Felix,

could you elaborate a bit on your use cases?
I am a bit unsure about the consistency, so it would be interesting to hear 
where you see the important points.

Thanks!
Julian

Am 31.03.19, 20:25 schrieb "Felix Cheung" :

I, on the other hand, would be very interested in the strong consistency option.

(Very cool discussion!)



From: Julian Feinauer 
Sent: Thursday, March 28, 2019 1:10 AM
To: dev@iotdb.apache.org
Subject: Re: IoTDB supports distributed version

Hi,

this is a very interesting (and important) question.
I think we should really consider what we can skip (from an application 
perspective) and what to keep.
Perhaps a Token Ring architecture like Cassandra uses could also be a good fit, 
if we hash on the device id or something.
At least in the situations and use cases I know (strong) consistency is not soo 
important.

From a CAP perspective, for me, Availability is the only undiscussable 
necessary thing... for the others... we can discuss : )

Julian

PS.: Perhaps it would be beneficial to create a design doc in confluence?

Am 28.03.19, 08:57 schrieb "Xiangdong Huang" :

yep, I think the cluster is in P2P mode when they startup. Then a leader
election algorithm will change the cluster into the M/S mode (RAFT
algorithm is qualified). If the master is down, a new master can be elected
and lead the cluster.

By the way, we need to consider the cost of keeping strong consistency of
data. As time series data in IoT scenario is hard to conflict with each
other ( I mean, user1 sends data point (t1, v1) that represents device 1
and sensor 1, meanwhile user2 sends a data point (t2, v2) that
represents the same device and sensor and t2=t1). So, supporting multiple
consistency level is better for keeping high write performance.

Best,

---
Xiangdong Huang
School of Software, Tsinghua University

黄向东
清华大学 软件学院


Julian Feinauer  于2019年3月28日周四 下午3:21写道:

> Hi XuYi,
>
> I like the idea but I'm unsure if I like the master / slave approach.
> We often deal with "Shopfloor" Scenarios where the setup for the Database
> is basically "MultiMaster", because we need to sync data one the one hand,
> but if a system goes down, everything else should keep working.
> Would this be possible with your approach?
> Something like leader re-election with Zookeper (or better Curator?).
> What exactly are the use cases you have in mind?
>
> Thanks!
> Julian
>
> Am 28.03.19, 05:32 schrieb "徐毅" :
>
>
>
>
> Hi,
>
>
>
>
> IoTDB only supports stand-alone version now. We plan to develop
> distributed version in next two months.
>
> We initially decided to use the master-slave architecture. The master
> node is responsible for processing read and write requests, and the slave
> node, which is a copy of master node is responsible for processing
> read-only requests.
>
> In terms of implementation, we currently intend to use the raft
> protocol to ensure the data consistency of each replica node.
>
> I have created an issue on jira at [1]. If you have any suggestion,
> please comment on jira or reply to this email.
>
> [1]. https://issues.apache.org/jira/browse/IOTDB-68
>
>
>
>
> Thanks
>
> XuYi
>
>






Re: Submitting code

2019-03-31 Thread Felix Cheung
Wait wait. Please clarify the goal of the branch? If it is just for “many big 
changes that might not be stable enough for master” the I’m for it. But we must 
follow the current process of PR, review, test for each PR or merge. By 
committers that has write access to the repo, for each PR. And as Jongyoul 
said, smaller PR that are complete makes it easier. Some test infra might have 
problem testing non master branche though.

So to clarify, if the goal is fast development or  experimental/incomplete 
changes then this will not be it. If that is the case please consider what I’ve 
suggested. Fork the repo, then share access. Open PR to merge bigger complete 
chunk when ready. This has been done many times in other projects.



From: Xun Liu 
Sent: Sunday, March 31, 2019 8:12 AM
To: Jongyoul Lee
Cc: Felix Cheung; H GHOSH; Jeff Zhang; Morkovkin, Basil; dev; moon soo Lee
Subject: Re: Submitting code

Hi

I agree to create an independent development branch.
This branch is only used to develop the workflow feature.
This ensures that the master branch code is not corrupted.

I will be with the HOMAGNI GHOSH & Basil Morkovkin,
Often rebase this branch to the master,
Keep the branch of workflow as up to date as possible.

Jongyoul Lee mailto:jongy...@gmail.com>> 于2019年3月31日周日 
下午11:04写道:
Hello,

If this work doesn’t block any others, I agree with making a develop branch. 
BTW, is it possible to give permissions to non-committers against some branches 
under the apache/zeppelin?

JL

On Sun, Mar 31, 2019 at 18:17 Xun Liu 
mailto:liuxun...@gmail.com>> wrote:
Thank you very much for your prompt reply.

In the Basil Morkovkin additional email, Basil Morkovkin proposed the idea of 
optimizing zeppelin's Scheduler.java.
So the development of workflow, It is possible to refactor some of the code and 
processes of zeppelin.
So I suggest also creating a development branch for workflow.
I am developing with HOMAGNI GHOSH & Basil Morkovkin on this branch.
All functions are developed, We will record a very detailed operation video.
After everyone’s approval, Let's put the code in this development branch again.
According to the consolidation of the ticket one by one into the master trunk 
branch.

Because I am not familiar with the open source community workflow,
So it brings some confusion to HOMAGNI GHOSH & Basil Morkovkin.
I feel very sorry. I will work hard to assist HOMAGNI GHOSH & Basil Morkovkin 
in the development of Workflow.

Jeff, please help me create a development branch called Workflow. Thank you!
:-)

Jeff Zhang mailto:zjf...@gmail.com>> 于2019年3月31日周日 下午1:46写道:
Making a new branch make sense to me, if no objection, I will create a branch 
for you.


Jongyoul Lee mailto:jongy...@gmail.com>> 于2019年3月31日周日 
下午12:29写道:
Hi,

Basically, all PRs should be merged separately in a master branch. By the same 
rule, if you have a big task which has several small tasks, all sub tasks 
should be reviewed and merged separately with a complete small function, even 
if it changes some behaviors.

Making branches help sometimes for some contributors but on the other hand, it 
might not have a chance to be reviewed by others.

Regards,
JL

On Sun, Mar 31, 2019 at 12:48 Xun Liu 
mailto:liuxun...@gmail.com>> wrote:
HI,Zeppelin PMC

I am a contributor to zeppelin Xun Liu.
When I was doing Zeppelin in GSOC 2019, A problem I can't solve, Who can help 
me?
Two students (HOMAGNI GHOSH & Basil Morkovkin) selected zeppelin workflow as 
their GSOC 2019 project.
JIRA: https://issues.apache.org/jira/browse/ZEPPELIN-4018

I keep in touch with them by email because workflow is a big feature.
Some sub-tasks should be created to do this.

Now that we have a function, we need to submit the code,
Where should their code be submitted?

These two students asked their questions,
I feel obligated to report back to you.
I think it's creating a development branch in zeppelin,
Code for merging them,
After all workflow has been developed and passed the system test,
And then merge it into the master branch.
What do you think?


-- Forwarded message -
发件人: Morkovkin, Basil 
mailto:morkovkin...@phystech.edu>>
Date: 2019年3月29日周五 下午9:26
Subject: Submitting code
To: Xun Liu mailto:liuxun...@gmail.com>>


Hi! I have an organization question: how do we submit the code for sub-tasks of 
ZEPPELIN-4018? Will we gather all the code in a separate branch until all 
features are implemented or just gather all in the master branch?


:)


Best regards, Basil Morkovkin
--
이종열, Jongyoul Lee, 李宗烈
http://madeng.net


--
Best Regards

Jeff Zhang
--
이종열, Jongyoul Lee, 李宗烈
http://madeng.net


Re: [VOTE] Release Apache Livy 0.6.0 (incubating) based on RC2

2019-03-29 Thread Felix Cheung
I’m happy to give +1 (binding)

But definitely ran into more problems with this and more time than I
anticipated.


checked:
- name includes incubating
- DISCLAIMER exists
Signature and hashes
- LICENSE and NOTICE all good
Copyright year, url to and copy full text for license
- No unexpected binary files (except below)
- All source files have ASF headers
- Can compile form source (with issues)


I switched to another environment, found out I needed to manually  pip
install configparser  just to build (skipTest)

And then without skipTest it failed even earlier on rsc  tests

Running org.apache.livy.rsc.TestSparkClient

Tests run: 19, Failures: 1, Errors: 3, Skipped: 0, Time elapsed: 30.948 sec
<<< FAILURE! - in org.apache.livy.rsc.TestSparkClient

testJobSubmission(org.apache.livy.rsc.TestSparkClient)  Time elapsed: 0.486
sec  <<< FAILURE!

org.mockito.exceptions.verification.WantedButNotInvoked:


Wanted but not invoked:

listener.onJobStarted(

org.apache.livy.rsc.JobHandleImpl@75805562

);

-> at org.apache.livy.rsc.TestSparkClient$1.call(TestSparkClient.java:101)

And then the earlier python2 utf-8 issue.

Quite a number of binary resources in the src release, ttf, svg etc
also this looks like standard bootstrap script, does it have to be in the
src release (vs to include in binary release)?
docs/assets/themes/apache/bootstrap/css/bootstrap.css

if that's doc only there's another copy similarly at
server/src/main/resources/org/apache/livy/server/ui/static/css/bootstrap.min.css



On Thu, Mar 28, 2019 at 4:05 PM Marcelo Vanzin 
wrote:

> On Thu, Mar 28, 2019 at 3:57 PM Felix Cheung 
> wrote:
>
> > I have LANG=“en_US.UTF-8”
> > I tried a couple of things finally it passed when I use virtualenv - my
> > python is Python 3, forcing that to Python 2 passed the test.
>
>
> Hmmm, that tells me that the python 3 path in the fake shell might not be
> as unicode-safe as it seems, and that this test should also be running
> against python 3...
>
> As a separate issue, things could be enhanced so that they force use of
> "python2" when running the python 2 tests.
>
> But in the spirit of "these are not regressions, just bugs", I'd rather not
> block the release for that. (I filed a but for the ASCII thing already,
> I'll file others for the above.)
>
>
> > However, now
> > another test failed (maybe connection blocked by firewall?)
> >
>
> That seems likely, if you have a local firewall? (8998 is Livy's default
> port. Or maybe you have something running on that port and the tests should
> be trying an ephemeral one...)
>
>
> > error_msg = "Connection refused by Responses: POST
> > http://machine:8998/sessions/ doesn't match Responses Mock"
> > response = ConnectionError(u"Connection refused by Responses: POST
> > http://machine:8998/sessions/doesn't match Responses Mock",)
> >
>
>
> --
> Marcelo
>


Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

2019-03-29 Thread Felix Cheung
I don’t take it as Sept 2019 is end of life for python 3.5 tho. It’s just 
saying the next release.

In any case I think in the next release it will be great to get more Python 3.x 
release test coverage.




From: shane knapp 
Sent: Friday, March 29, 2019 4:46 PM
To: Bryan Cutler
Cc: Felix Cheung; Hyukjin Kwon; dev
Subject: Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

i'm not opposed to 3.6 at all.

On Fri, Mar 29, 2019 at 4:16 PM Bryan Cutler 
mailto:cutl...@gmail.com>> wrote:
PyArrow dropping Python 3.4 was mainly due to support going away at Conda-Forge 
and other dependencies also dropping it.  I think we better upgrade Jenkins 
Python while we are at it.  Are you all against jumping to Python 3.6 so we are 
not in the same boat in September?

On Thu, Mar 28, 2019 at 7:58 PM Felix Cheung 
mailto:felixcheun...@hotmail.com>> wrote:
3.4 is end of life but 3.5 is not. From your link

we expect to release Python 3.5.8 around September 2019.




From: shane knapp mailto:skn...@berkeley.edu>>
Sent: Thursday, March 28, 2019 7:54 PM
To: Hyukjin Kwon
Cc: Bryan Cutler; dev; Felix Cheung
Subject: Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

looks like the same for 3.5...   https://www.python.org/dev/peps/pep-0478/

let's pick a python version and start testing.

On Thu, Mar 28, 2019 at 7:52 PM shane knapp 
mailto:skn...@berkeley.edu>> wrote:

If there was, it looks inevitable to upgrade Jenkins\s Python from 3.4 to 3.5.

this is inevitable.  3.4s final release was 10 days ago 
(https://www.python.org/dev/peps/pep-0429/) so we're basically EOL.


--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: [k8s][jenkins] spark dev tool docs now have k8s+minikube instructions!

2019-03-29 Thread Felix Cheung
Definitely the part on the PR. Thanks!



From: shane knapp 
Sent: Thursday, March 28, 2019 11:19 AM
To: dev; Stavros Kontopoulos
Subject: [k8s][jenkins] spark dev tool docs now have k8s+minikube instructions!

https://spark.apache.org/developer-tools.html

search for "Testing K8S".

this is pretty much how i build and test PRs locally...  the commands there are 
lifted straight from the k8s integration test jenkins build, so they might 
require a little tweaking to better suit your laptop/server.

k8s is great (except when it's not), and it's really quite easy to get set up 
(except when it's not).  stackoverflow is your friend, and the minikube slack 
was really useful.

some of this is a little hacky (running the mount process in the background, 
for example), but there's a lot of development on minikube right now...  the 
k8s project understands the importance of minikube and has dedicated 
engineering resources involved.

and finally, if you have a suggesting for the docs, open a PR!  they are always 
welcome!

shane

ps- and a special thanks to @Stavros 
Kontopoulos and the PR from hell for 
throwing me in the deep end of k8s.  :)
--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: [VOTE] Release Apache Spark 2.4.1 (RC9)

2019-03-29 Thread Felix Cheung
+1

build source
R tests
R package CRAN check locally, r-hub



From: d_t...@apple.com on behalf of DB Tsai 
Sent: Wednesday, March 27, 2019 11:31 AM
To: dev
Subject: [VOTE] Release Apache Spark 2.4.1 (RC9)

Please vote on releasing the following candidate as Apache Spark version 2.4.1.

The vote is open until March 30 PST and passes if a majority +1 PMC votes are 
cast, with
a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 2.4.1
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.4.1-rc9 (commit 
58301018003931454e93d8a309c7149cf84c279e):
https://github.com/apache/spark/tree/v2.4.1-rc9

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.4.1-rc9-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1319/

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.4.1-rc9-docs/

The list of bug fixes going into 2.4.1 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/2.4.1

FAQ

=
How can I help test this release?
=

If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with a out of date RC going forward).

===
What should happen to JIRA tickets still targeting 2.4.1?
===

The current list of open tickets targeted at 2.4.1 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target Version/s" 
= 2.4.1

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==
But my bug isn't fixed?
==

In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.


DB Tsai | Siri Open Source Technologies [not a contribution] |  Apple, Inc


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Release Apache Spark 2.4.1 (RC9)

2019-03-29 Thread Felix Cheung
(I think the .invalid is added by the list server)

Personally I’d rather everyone just +1 or -1, and shouldn’t add binding or not. 
It’s really the responsibility of the RM to confirm if a vote is binding. 
Mistakes have been made otherwise.



From: Marcelo Vanzin 
Sent: Thursday, March 28, 2019 3:56 PM
To: dev
Subject: Re: [VOTE] Release Apache Spark 2.4.1 (RC9)

(Anybody knows what's the deal with all the .invalid e-mail addresses?)

Anyway. ASF has voting rules, and some things like releases follow
specific rules:
https://www.apache.org/foundation/voting.html#ReleaseVotes

So, for releases, ultimately, the only votes that "count" towards the
final tally are PMC votes. But everyone is welcome to vote, especially
if they have a reason to -1 a release. PMC members can use that to
guide how they vote, or the RM can use that to drop the RC
unilaterally if he agrees with the reason.


On Thu, Mar 28, 2019 at 3:47 PM Jonatan Jäderberg
 wrote:
>
> +1 (user vote)
>
> btw what to call a vote that is not pmc or committer?
> Some people use "non-binding”, but nobody says “my vote is binding”, and if 
> some vote is important to me, I still need to look up the who’s-who of the 
> project to be able to tally the votes.
> I like `user vote` for someone who has their say but is not speaking with any 
> authority (i.e., not pmc/committer). wdyt?
>
> Also, let’s get this release out the door!
>
> cheers,
> Jonatan
>
> On 28 Mar 2019, at 21:31, DB Tsai  wrote:
>
> +1 from myself
>
> On Thu, Mar 28, 2019 at 3:14 AM Mihaly Toth  
> wrote:
>>
>> +1 (non-binding)
>>
>> Thanks, Misi
>>
>> Sean Owen  ezt írta (időpont: 2019. márc. 28., Cs, 0:19):
>>>
>>> +1 from me - same as last time.
>>>
>>> On Wed, Mar 27, 2019 at 1:31 PM DB Tsai  wrote:
>>> >
>>> > Please vote on releasing the following candidate as Apache Spark version 
>>> > 2.4.1.
>>> >
>>> > The vote is open until March 30 PST and passes if a majority +1 PMC votes 
>>> > are cast, with
>>> > a minimum of 3 +1 votes.
>>> >
>>> > [ ] +1 Release this package as Apache Spark 2.4.1
>>> > [ ] -1 Do not release this package because ...
>>> >
>>> > To learn more about Apache Spark, please see http://spark.apache.org/
>>> >
>>> > The tag to be voted on is v2.4.1-rc9 (commit 
>>> > 58301018003931454e93d8a309c7149cf84c279e):
>>> > https://github.com/apache/spark/tree/v2.4.1-rc9
>>> >
>>> > The release files, including signatures, digests, etc. can be found at:
>>> > https://dist.apache.org/repos/dist/dev/spark/v2.4.1-rc9-bin/
>>> >
>>> > Signatures used for Spark RCs can be found in this file:
>>> > https://dist.apache.org/repos/dist/dev/spark/KEYS
>>> >
>>> > The staging repository for this release can be found at:
>>> > https://repository.apache.org/content/repositories/orgapachespark-1319/
>>> >
>>> > The documentation corresponding to this release can be found at:
>>> > https://dist.apache.org/repos/dist/dev/spark/v2.4.1-rc9-docs/
>>> >
>>> > The list of bug fixes going into 2.4.1 can be found at the following URL:
>>> > https://issues.apache.org/jira/projects/SPARK/versions/2.4.1
>>> >
>>> > FAQ
>>> >
>>> > =
>>> > How can I help test this release?
>>> > =
>>> >
>>> > If you are a Spark user, you can help us test this release by taking
>>> > an existing Spark workload and running on this release candidate, then
>>> > reporting any regressions.
>>> >
>>> > If you're working in PySpark you can set up a virtual env and install
>>> > the current RC and see if anything important breaks, in the Java/Scala
>>> > you can add the staging repository to your projects resolvers and test
>>> > with the RC (make sure to clean up the artifact cache before/after so
>>> > you don't end up building with a out of date RC going forward).
>>> >
>>> > ===
>>> > What should happen to JIRA tickets still targeting 2.4.1?
>>> > ===
>>> >
>>> > The current list of open tickets targeted at 2.4.1 can be found at:
>>> > https://issues.apache.org/jira/projects/SPARK and search for "Target 
>>> > Version/s" = 2.4.1
>>> >
>>> > Committers should look at those and triage. Extremely important bug
>>> > fixes, documentation, and API tweaks that impact compatibility should
>>> > be worked on immediately. Everything else please retarget to an
>>> > appropriate release.
>>> >
>>> > ==
>>> > But my bug isn't fixed?
>>> > ==
>>> >
>>> > In order to make timely releases, we will typically not hold the
>>> > release unless the bug in question is a regression from the previous
>>> > release. That being said, if there is something which is a regression
>>> > that has not been correctly targeted please ping me or a committer to
>>> > help target the issue.
>>> >
>>> >
>>> > DB Tsai | Siri Open Source Technologies [not a contribution] |  Apple, 
>>> > Inc
>>> >
>>> >
>>> > 

Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

2019-03-28 Thread Felix Cheung
3.4 is end of life but 3.5 is not. From your link

we expect to release Python 3.5.8 around September 2019.




From: shane knapp 
Sent: Thursday, March 28, 2019 7:54 PM
To: Hyukjin Kwon
Cc: Bryan Cutler; dev; Felix Cheung
Subject: Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

looks like the same for 3.5...   https://www.python.org/dev/peps/pep-0478/

let's pick a python version and start testing.

On Thu, Mar 28, 2019 at 7:52 PM shane knapp 
mailto:skn...@berkeley.edu>> wrote:

If there was, it looks inevitable to upgrade Jenkins\s Python from 3.4 to 3.5.

this is inevitable.  3.4s final release was 10 days ago 
(https://www.python.org/dev/peps/pep-0429/) so we're basically EOL.


--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: [VOTE] Release Apache Livy 0.6.0 (incubating) based on RC2

2019-03-28 Thread Felix Cheung
I have LANG=“en_US.UTF-8”
I tried a couple of things finally it passed when I use virtualenv - my
python is Python 3, forcing that to Python 2 passed the test. However, now
another test failed (maybe connection blocked by firewall?)


=== FAILURES
===
 test_create_new_session_without_default_config


def test_create_new_session_without_default_config():
> mock_and_validate_create_new_session(False)

src/test/python/livy-tests/client_test.py:105:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _
:3: in wrapper
???
src/test/python/livy-tests/client_test.py:48: in
mock_and_validate_create_new_session
load_defaults=defaults)
src/main/python/livy/client.py:88: in __init__
session_conf_dict).json()['id']
src/main/python/livy/client.py:388: in _create_new_session
headers=self._conn._JSON_HEADERS, data=data)
src/main/python/livy/client.py:500: in send_request
json=data, auth=self._spnego_auth())
.eggs/requests-2.21.0-py2.7.egg/requests/api.py:60: in request
return session.request(method=method, url=url, **kwargs)
.eggs/requests-2.21.0-py2.7.egg/requests/sessions.py:533: in request
resp = self.send(prep, **send_kwargs)
.eggs/requests-2.21.0-py2.7.egg/requests/sessions.py:646: in send
r = adapter.send(request, **kwargs)
.eggs/responses-0.10.6-py2.7.egg/responses.py:626: in unbound_on_send
return self._on_request(adapter, request, *a, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _

self = 
adapter = 
request = 
kwargs = {'cert': None, 'proxies': OrderedDict(), 'stream': False,
'timeout': 10, ...}
match = None, resp_callback = None
error_msg = "Connection refused by Responses: POST
http://machine:8998/sessions/ doesn't match Responses Mock"
response = ConnectionError(u"Connection refused by Responses: POST
http://machine:8998/sessions/doesn't match Responses Mock",)


On Thu, Mar 28, 2019 at 11:52 AM Marcelo Vanzin 
wrote:

> I can reproduce it with this:
>
>  LC_ALL=en_US.ASCII mvn -Pspark-2.4 -Pthriftserver test -pl :livy-repl_2.11
> -Dsuites=*.Python2*
>
> Seems that Livy's fake python shell expects UTF-8 when running on Python 2.
> This is not a new bug, so while we should fix it, not sure we need to fix
> it in this release.
>
>
> On Thu, Mar 28, 2019 at 11:24 AM sebb  wrote:
>
> > On Thu, 28 Mar 2019 at 17:02, Marcelo Vanzin  >
> > wrote:
> >
> > > Are you using a different encoding that UTF-8 in your environment?
> > >
> > > The source file contains unicode escapes, so that shouldn't be the
> > problem.
> > > It may be the test is expecting the output of child processes (in this
> > case
> > > the python interpreter) to be UTF-8.
> > >
> > >
> > If that is the case, then the test case ought to be fixed...
> >
> >
> > > On Wed, Mar 27, 2019 at 10:25 PM Felix Cheung 
> > > wrote:
> > >
> > > > This test is consistently failing when I build, any idea what’s wrong
> > in
> > > my
> > > > setup?
> > > >
> > > > - should print unicode correctly *** FAILED *** (101 milliseconds)
> > > > ExecuteSuccess(JObject(List((text/plain,JString(☺) did not
> equal
> > > > ExecuteSuccess(JObject(List((text/plain,JString(☺)
> > > > (PythonInterpreterSpec.scala:272)
> > > >
> > > >
> > > > On Tue, Mar 26, 2019 at 1:36 PM Marcelo Vanzin
> > >  > > > >
> > > > wrote:
> > > >
> > > > > The Livy PPMC has voted to release Livy 0.6.0 RC2 as the next Livy
> > > > release.
> > > > >
> > > > > Livy enables programmatic, fault-tolerant, multi-tenant submission
> of
> > > > > Spark jobs from web/mobile apps (no Spark client needed). So,
> > multiple
> > > > > users can interact with your Spark cluster concurrently and
> reliably.
> > > > >
> > > > > Vote thread:
> > > > >
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/24c6c184f33c611aa83ec5d1c9948c96610b36df503b5e7f100ff4a2@%3Cdev.livy.apache.org%3E
> > > > >
> > > > > (Note I messed up the subject on the first e-mail, that thread is
> for
> > > > > the RC2 vote.)
> > > > >
> > > > > Result thread:
> > > > >
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/70c715f6394f06f0a49f76671b0f57cd1cdca35f7862a9ad2cf87fd7@%3Cdev.livy.apache.org%3E
&g

Re: spark.submit.deployMode: cluster

2019-03-28 Thread Felix Cheung
If anyone wants to improve docs please create a PR.

lol


But seriously you might want to explore other projects that manage job 
submission on top of spark instead of rolling your own with spark-submit.



From: Pat Ferrel 
Sent: Tuesday, March 26, 2019 2:38 PM
To: Marcelo Vanzin
Cc: user
Subject: Re: spark.submit.deployMode: cluster

Ahh, thank you indeed!

It would have saved us a lot of time if this had been documented. I know, OSS 
so contributions are welcome… I can also imagine your next comment; “If anyone 
wants to improve docs see the Apache contribution rules and create a PR.” or 
something like that.

BTW the code where the context is known and can be used is what I’d call a 
Driver and since all code is copied to nodes and is know in jars, it was not 
obvious to us that this rule existed but it does make sense.

We will need to refactor our code to use spark-submit it appears.

Thanks again.


From: Marcelo Vanzin 
Reply: Marcelo Vanzin 
Date: March 26, 2019 at 1:59:36 PM
To: Pat Ferrel 
Cc: user 
Subject:  Re: spark.submit.deployMode: cluster

If you're not using spark-submit, then that option does nothing.

If by "context creation API" you mean "new SparkContext()" or an
equivalent, then you're explicitly creating the driver inside your
application.

On Tue, Mar 26, 2019 at 1:56 PM Pat Ferrel 
mailto:p...@occamsmachete.com>> wrote:
>
> I have a server that starts a Spark job using the context creation API. It 
> DOES NOY use spark-submit.
>
> I set spark.submit.deployMode = “cluster”
>
> In the GUI I see 2 workers with 2 executors. The link for running application 
> “name” goes back to my server, the machine that launched the job.
>
> This is spark.submit.deployMode = “client” according to the docs. I set the 
> Driver to run on the cluster but it runs on the client, ignoring the 
> spark.submit.deployMode.
>
> Is this as expected? It is documented nowhere I can find.
>


--
Marcelo


Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

2019-03-28 Thread Felix Cheung
That’s not necessarily bad. I don’t know if we have plan to ever release any 
new 2.2.x, 2.3.x at this point and we can message this “supported version” of 
python change for any new 2.4 release.

Besides we could still support python 3.4 - it’s just more complicated to test 
manually without Jenkins coverage.



From: shane knapp 
Sent: Tuesday, March 26, 2019 12:11 PM
To: Bryan Cutler
Cc: dev
Subject: Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

i'm pretty certain that i've got a solid python 3.5 conda environment ready to 
be deployed, but this isn't a minor change to the build system and there might 
be some bugs to iron out.

another problem is that the current python 3.4 environment is hard-coded in to 
the both the build scripts on jenkins (all over the place) and in the codebase 
(thankfully in only one spot):  export PATH=/home/anaconda/envs/py3k/bin:$PATH

this means that every branch (master, 2.x, etc) will test against whatever 
version of python lives in that conda environment.  if we upgrade to 3.5, all 
branches will test against this version.  changing the build and test infra to 
support testing against 2.7, 3.4 or 3.5 based on branch is definitely 
non-trivial...

thoughts?




On Tue, Mar 26, 2019 at 11:39 AM Bryan Cutler 
mailto:cutl...@gmail.com>> wrote:
Thanks Hyukjin.  The plan is to get this done for 3.0 only.  Here is a link to 
the JIRA https://issues.apache.org/jira/browse/SPARK-27276.  Shane is also 
correct in that newer versions of pyarrow have stopped support for Python 3.4, 
so we should probably have Jenkins test against 2.7 and 3.5.

On Mon, Mar 25, 2019 at 9:44 PM Reynold Xin 
mailto:r...@databricks.com>> wrote:

+1 on doing this in 3.0.


On Mon, Mar 25, 2019 at 9:31 PM, Felix Cheung 
mailto:felixcheun...@hotmail.com>> wrote:
I’m +1 if 3.0



From: Sean Owen mailto:sro...@gmail.com>>
Sent: Monday, March 25, 2019 6:48 PM
To: Hyukjin Kwon
Cc: dev; Bryan Cutler; Takuya UESHIN; shane knapp
Subject: Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

I don't know a lot about Arrow here, but seems reasonable. Is this for
Spark 3.0 or for 2.x? Certainly, requiring the latest for Spark 3
seems right.

On Mon, Mar 25, 2019 at 8:17 PM Hyukjin Kwon 
mailto:gurwls...@gmail.com>> wrote:
>
> Hi all,
>
> We really need to upgrade the minimal version soon. It's actually slowing 
> down the PySpark dev, for instance, by the overhead that sometimes we need 
> currently to test all multiple matrix of Arrow and Pandas. Also, it currently 
> requires to add some weird hacks or ugly codes. Some bugs exist in lower 
> versions, and some features are not supported in low PyArrow, for instance.
>
> Per, (Apache Arrow'+ Spark committer FWIW), Bryan's recommendation and my 
> opinion as well, we should better increase the minimal version to 0.12.x. 
> (Also, note that Pandas <> Arrow is an experimental feature).
>
> So, I and Bryan will proceed this roughly in few days if there isn't 
> objections assuming we're fine with increasing it to 0.12.x. Please let me 
> know if there are some concerns.
>
> For clarification, this requires some jobs in Jenkins to upgrade the minimal 
> version of PyArrow (I cc'ed Shane as well).
>
> PS: I roughly heard that Shane's busy for some work stuff .. but it's kind of 
> important in my perspective.
>

-
To unsubscribe e-mail: 
dev-unsubscr...@spark.apache.org<mailto:dev-unsubscr...@spark.apache.org>



--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: [VOTE] Release Apache Livy 0.6.0 (incubating) based on RC2

2019-03-28 Thread Felix Cheung
I see, thanks. Perhaps the way I unzip the src zip file then. Is it a
better approach to unpack it?


On Thu, Mar 28, 2019 at 2:03 AM sebb  wrote:

> On Thu, 28 Mar 2019 at 05:25, Felix Cheung  wrote:
>
> > This test is consistently failing when I build, any idea what’s wrong in
> my
> > setup?
> >
> > - should print unicode correctly *** FAILED *** (101 milliseconds)
> > ExecuteSuccess(JObject(List((text/plain,JString(☺) did not equal
> > ExecuteSuccess(JObject(List((text/plain,JString(☺)
> > (PythonInterpreterSpec.scala:272)
> >
> >
> I have seen such errors before; usually it is because the test source file
> has been incorrectly encoded/decoded at some point.
> Although they are harder to maintain, it may be worth coding the test files
> using Unicode escapes rather than the non-ASCII characters.
> ASCII is much harder to mangle...
>
>
>
> > On Tue, Mar 26, 2019 at 1:36 PM Marcelo Vanzin
>  > >
> > wrote:
> >
> > > The Livy PPMC has voted to release Livy 0.6.0 RC2 as the next Livy
> > release.
> > >
> > > Livy enables programmatic, fault-tolerant, multi-tenant submission of
> > > Spark jobs from web/mobile apps (no Spark client needed). So, multiple
> > > users can interact with your Spark cluster concurrently and reliably.
> > >
> > > Vote thread:
> > >
> > >
> >
> https://lists.apache.org/thread.html/24c6c184f33c611aa83ec5d1c9948c96610b36df503b5e7f100ff4a2@%3Cdev.livy.apache.org%3E
> > >
> > > (Note I messed up the subject on the first e-mail, that thread is for
> > > the RC2 vote.)
> > >
> > > Result thread:
> > >
> > >
> >
> https://lists.apache.org/thread.html/70c715f6394f06f0a49f76671b0f57cd1cdca35f7862a9ad2cf87fd7@%3Cdev.livy.apache.org%3E
> > >
> > > The RC is based on tag v0.6.0-incubating-rc2:
> > > https://github.com/apache/incubator-livy/commit/28be98cabc
> > >
> > > The release files can be found here:
> > >
> >
> https://dist.apache.org/repos/dist/dev/incubator/livy/0.6.0-incubating-rc2/
> > >
> > > The staged maven artifacts can be found here:
> > > https://repository.apache.org/content/repositories/orgapachelivy-1008
> > >
> > > The list of resolved JIRAs in this release can be found here:
> > > https://issues.apache.org/jira/projects/LIVY/versions/12342736
> > >
> > > Vote will be open for at least 72 hours. Thanks!
> > >
> > > --
> > > Marcelo
> > >
> > > -
> > > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > > For additional commands, e-mail: general-h...@incubator.apache.org
> > >
> > >
> >
>


Re: [VOTE] Release Apache Livy 0.6.0 (incubating) based on RC2

2019-03-27 Thread Felix Cheung
This test is consistently failing when I build, any idea what’s wrong in my
setup?

- should print unicode correctly *** FAILED *** (101 milliseconds)
ExecuteSuccess(JObject(List((text/plain,JString(☺) did not equal
ExecuteSuccess(JObject(List((text/plain,JString(☺)
(PythonInterpreterSpec.scala:272)


On Tue, Mar 26, 2019 at 1:36 PM Marcelo Vanzin 
wrote:

> The Livy PPMC has voted to release Livy 0.6.0 RC2 as the next Livy release.
>
> Livy enables programmatic, fault-tolerant, multi-tenant submission of
> Spark jobs from web/mobile apps (no Spark client needed). So, multiple
> users can interact with your Spark cluster concurrently and reliably.
>
> Vote thread:
>
> https://lists.apache.org/thread.html/24c6c184f33c611aa83ec5d1c9948c96610b36df503b5e7f100ff4a2@%3Cdev.livy.apache.org%3E
>
> (Note I messed up the subject on the first e-mail, that thread is for
> the RC2 vote.)
>
> Result thread:
>
> https://lists.apache.org/thread.html/70c715f6394f06f0a49f76671b0f57cd1cdca35f7862a9ad2cf87fd7@%3Cdev.livy.apache.org%3E
>
> The RC is based on tag v0.6.0-incubating-rc2:
> https://github.com/apache/incubator-livy/commit/28be98cabc
>
> The release files can be found here:
> https://dist.apache.org/repos/dist/dev/incubator/livy/0.6.0-incubating-rc2/
>
> The staged maven artifacts can be found here:
> https://repository.apache.org/content/repositories/orgapachelivy-1008
>
> The list of resolved JIRAs in this release can be found here:
> https://issues.apache.org/jira/projects/LIVY/versions/12342736
>
> Vote will be open for at least 72 hours. Thanks!
>
> --
> Marcelo
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

2019-03-25 Thread Felix Cheung
I’m +1 if 3.0



From: Sean Owen 
Sent: Monday, March 25, 2019 6:48 PM
To: Hyukjin Kwon
Cc: dev; Bryan Cutler; Takuya UESHIN; shane knapp
Subject: Re: Upgrading minimal PyArrow version to 0.12.x [SPARK-27276]

I don't know a lot about Arrow here, but seems reasonable. Is this for
Spark 3.0 or for 2.x? Certainly, requiring the latest for Spark 3
seems right.

On Mon, Mar 25, 2019 at 8:17 PM Hyukjin Kwon  wrote:
>
> Hi all,
>
> We really need to upgrade the minimal version soon. It's actually slowing 
> down the PySpark dev, for instance, by the overhead that sometimes we need 
> currently to test all multiple matrix of Arrow and Pandas. Also, it currently 
> requires to add some weird hacks or ugly codes. Some bugs exist in lower 
> versions, and some features are not supported in low PyArrow, for instance.
>
> Per, (Apache Arrow'+ Spark committer FWIW), Bryan's recommendation and my 
> opinion as well, we should better increase the minimal version to 0.12.x. 
> (Also, note that Pandas <> Arrow is an experimental feature).
>
> So, I and Bryan will proceed this roughly in few days if there isn't 
> objections assuming we're fine with increasing it to 0.12.x. Please let me 
> know if there are some concerns.
>
> For clarification, this requires some jobs in Jenkins to upgrade the minimal 
> version of PyArrow (I cc'ed Shane as well).
>
> PS: I roughly heard that Shane's busy for some work stuff .. but it's kind of 
> important in my perspective.
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Edit access to apache wiki

2019-03-24 Thread Felix Cheung
It happens again so I’m pretty sure certain email is filtering this as spam.


Neha, please give your account name (register there
https://wiki.apache.org/ first) so you can be added.

Also as discussed Pinot is not required to report April.



--
> *From:* Neha Pawar 
> *Sent:* Sunday, March 24, 2019 9:59 AM
> *To:* general@incubator.apache.org
> *Subject:* Re: Edit access to apache wiki
>
> Bumping this up. Please help with this.
>
>
> Thanks,
>
> Neha
>
> 
> From: Neha Pawar
> Sent: Thursday, March 21, 2019 9:29:31 AM
> To: general@incubator.apache.org
> Subject: Re: Edit access to apache wiki
>
>
> Hi,
>
>
> Bumping this up
>
>
> Thanks,
>
> Neha
>
> 
> From: Neha Pawar
> Sent: Wednesday, March 20, 2019 10:29:13 AM
> To: general@incubator.apache.org
> Subject: Edit access to apache wiki
>
>
> Hi,
>
>
> I need edit access to apache wiki. I will be writing the report for
> project Pinot for April 2019 here
> https://wiki.apache.org/incubator/April2019
>
>
> Thank,
>
> Neha
>


Re: Spark - Hadoop custom filesystem service loading

2019-03-23 Thread Felix Cheung
Hmm thanks. Do you have a proposed solution?



From: Jhon Anderson Cardenas Diaz 
Sent: Monday, March 18, 2019 1:24 PM
To: user
Subject: Spark - Hadoop custom filesystem service loading

Hi everyone,

On spark 2.2.0, if you wanted to create a custom file system implementation, 
you just created an extension of org.apache.hadoop.fs.FileSystem and put the 
canonical name of the custom class on the file 
src/main/resources/META-INF/services/org.apache.hadoop.fs.FileSystem.

Once you imported that jar dependency on your spark submit application, the 
custom schema was automatically loaded, and you could start to use it just like 
ds.load("customfs://path").

But on spark 2.4.0 that does not seem to work the same. If you do exactly the 
same you will get an error like "No FileSystem for customfs".

The only way I achieved this on 2.4.0, was specifying the spark property 
spark.hadoop.fs.customfs.impl.

Do you guys consider this as a bug? or is it an intentional change that should 
be documented on somewhere?

Btw, digging a little bit on this, it seems that the cause is that now the 
FileSystem is initialized before the actual dependencies are downloaded from 
Maven repo (see 
here).
 And as that initialization loads the available filesystems at that point and 
only once, the filesystems in the jars downloaded are not taken in account.

Thanks.


Re: [VOTE] Release Apache Spark 2.4.1 (RC8)

2019-03-20 Thread Felix Cheung
Reposting for shane here

[SPARK-27178]
https://github.com/apache/spark/commit/342e91fdfa4e6ce5cc3a0da085d1fe723184021b

Is problematic too and it’s not in the rc8 cut

https://github.com/apache/spark/commits/branch-2.4

(Personally I don’t want to delay 2.4.1 either..)


From: Sean Owen 
Sent: Wednesday, March 20, 2019 11:18 AM
To: DB Tsai
Cc: dev
Subject: Re: [VOTE] Release Apache Spark 2.4.1 (RC8)

+1 for this RC. The tag is correct, licenses and sigs check out, tests
of the source with most profiles enabled works for me.

On Tue, Mar 19, 2019 at 5:28 PM DB Tsai  wrote:
>
> Please vote on releasing the following candidate as Apache Spark version 
> 2.4.1.
>
> The vote is open until March 23 PST and passes if a majority +1 PMC votes are 
> cast, with
> a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 2.4.1
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.4.1-rc8 (commit 
> 746b3ddee6f7ad3464e326228ea226f5b1f39a41):
> https://github.com/apache/spark/tree/v2.4.1-rc8
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.4.1-rc8-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1318/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.4.1-rc8-docs/
>
> The list of bug fixes going into 2.4.1 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/2.4.1
>
> FAQ
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 2.4.1?
> ===
>
> The current list of open tickets targeted at 2.4.1 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target 
> Version/s" = 2.4.1
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
>
> DB Tsai | Siri Open Source Technologies [not a contribution] |  Apple, Inc
>
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [discuss] Zeppelin support workflow

2019-03-16 Thread Felix Cheung
I like it!


From: Jongyoul Lee 
Sent: Monday, March 11, 2019 9:05:03 PM
To: dev
Subject: Re: [discuss] Zeppelin support workflow

Thanks for the sharing this kind of discussion.

I'm interested in it. Will see it.

On Mon, Mar 11, 2019 at 10:43 AM Xun Liu  wrote:

> Hello, everyone
>
> Because there are more than 20 interpreters in zeppelin,  Data analysts
> can be used to do a variety of data development,
> A lot of data development is interdependent.
> For example, the development of machine learning algorithms requires
> relying on spark to preprocess data, and so on.
>
> Zeppelin should have built-in workflow capabilities. Instead of relying on
> external software to schedule notes in zeppelin for the following reasons:
>
> 1. Now that we have upgraded from the data processing era to the algorithm
> era, After zeppelin has its own workflow,
> Will have a complete ecosystem of complete data processing and algorithmic
> operations.
> 2. zeppelin's powerful interactive processing capabilities help algorithm
> engineers improve productivity and work.
> Zeppelin should give the algorithm engineer more direct control. Instead
> of handing the algorithm to other teams(or software) to do the workflow.
> 3. zeppelin knows more about the processing status of data than Azkaban
> and airflow.
> So the built-in workflow will have better performance, user experience and
> control.
>
> Typical use case
> Especially in machine learning, Because machine learning generally has a
> long task execution.
> A typical example is as follows:
> 1) First, obtain data from HDFS through spark;
> 2) Clean and convert the data through sparksql;
> 3) Feature extraction of data through spark;
> 4) Tensorflow writing algorithm through hadoop submarine;
> 5) Distribute the tensorflow algorithm as a job to YARN or k8s for batch
> processing;
> 6) Publish the training acquisition model and provide online prediction
> services;
> 7) Model prediction by flink;
> 8) Receive incremental data through flink for incremental update of the
> model;
> Therefore, zeppelin is especially required to have the ability to arrange
> workflows.
>
> I completed the draft of the zeppelin workflow system design, please
> review, you can directly modify the document or fill in the comments.
>
> JIRA: https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
> https://issues.apache.org/jira/browse/ZEPPELIN-4018>
> gdoc:
> https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit
> <
> https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit>
>
>
> :-)
>
> Xun Liu
> 2019-03-11



--
이종열, Jongyoul Lee, 李宗烈
http://madeng.net


Re: [VOTE] Accept DataSketches into the Apache Incubator

2019-03-15 Thread Felix Cheung
+1 (binding)

On Fri, Mar 15, 2019 at 6:19 AM Jean-Baptiste Onofré 
wrote:

> +1 (binding)
>
> Regards
> JB
>
> On 14/03/2019 22:23, Kenneth Knowles wrote:
> > Hi all,
> >
> > We've discussed the proposal for the DataSketches project in [1] and [2].
> > The
> > proposal itself has been put on the wiki [3].
> >
> > Per incubator rules [4] I'd like to call a vote to accept the new
> > "DataSketches" project as a podling in the Apache Incubator.
> >
> > A vote for accepting a new Apache Incubator podling is a majority vote.
> > Everyone is welcome to vote, only Incubator PMC member votes are binding.
> > It would be helpful (but not required) if you could add a comment stating
> > whether your vote is binding or non-binding.
> >
> > This vote will run for at least 72 hours (but I expect to keep it open
> for
> > longer). Please VOTE as follows:
> >
> > [ ] +1 Accept DataSketches into the Apache Incubator
> > [ ] +0 Abstain
> > [ ] -1 Do not accept DataSketches into the Apache Incubator because ...
> >
> > Thanks to everyone who contributed to the proposal and discussions.
> >
> > Kenn
> >
> > [1]
> >
> https://lists.apache.org/thread.html/329354bd6a463dab56c2539972cfa2d6c6da7c75900216d785db4e3b@%3Cgeneral.incubator.apache.org%3E
> > [2]
> >
> https://lists.apache.org/thread.html/c9873cd4fcdc6367bcf530d8fa1ef09f3035f38e7c435e1a79a93885@%3Cgeneral.incubator.apache.org%3E
> > [3] https://wiki.apache.org/incubator/DataSketchesProposal
> > [4] https://incubator.apache.org/guides/proposal.html#the_vote
> >
>
> --
> Jean-Baptiste Onofré
> jbono...@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>
> -
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


[jira] [Commented] (SPARK-26910) Re-release SparkR to CRAN

2019-03-13 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792120#comment-16792120
 ] 

Felix Cheung commented on SPARK-26910:
--

2.3.3 failed. we are waiting for 2.4.1 to be released

> Re-release SparkR to CRAN
> -
>
> Key: SPARK-26910
> URL: https://issues.apache.org/jira/browse/SPARK-26910
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Affects Versions: 2.4.0
>Reporter: Michael Chirico
>    Assignee: Felix Cheung
>Priority: Major
>
> The logical successor to https://issues.apache.org/jira/browse/SPARK-15799
> I don't see anything specifically tracking re-release in the Jira list. It 
> would be helpful to have an issue tracking this to refer to as an outsider, 
> as well as to document what the blockers are in case some outside help could 
> be useful.
>  * Is there a plan to re-release SparkR to CRAN?
>  * What are the major blockers to doing so at the moment?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Season of Docs 2019

2019-03-13 Thread Felix Cheung
https://developers.google.com/season-of-docs/

Sounds like a good idea?


Google announces the inaugural year of Season of Docs, a Google program that 
fosters collaboration between open source projects and technical writers. 
Season of Docs is similar to Summer of Code, but with a focus on open source 
documentation and technical writers. Details are on  website:g.co/seasonofdocs.



Call for Presentations (CFP) is now open for ApacheCon North America in Las Vegas, September 9-13th

2019-03-12 Thread Felix Cheung
See the Machine Learning track!

We’re delighted to announce that the Call for Presentations (CFP) is now open 
for ApacheCon North America in Las Vegas, September 9-13th! As the official 
conference series of the ASF, ApacheCon North America will feature over a dozen 
Apache project summits. We’re looking for talks in a wide variety of categories 
-- anything related to ASF projects and the Apache development process. The CFP 
closes at midnight on May 26th.

In addition, the ASF will be celebrating its 20th Anniversary during the event. 
For more details and to submit a proposal for the CFP, visit

https://apachecon.com/acna19/ . Registration will be opening soon.








Call for Presentations (CFP) is now open for ApacheCon North America in Las Vegas, September 9-13th

2019-03-12 Thread Felix Cheung
See the Big Data track!

We’re delighted to announce that the Call for Presentations (CFP) is now
open for ApacheCon North America in Las Vegas, September 9-13th! As the
official conference series of the ASF, ApacheCon North America will
feature over a dozen Apache project summits, including Cassandra,
Cloudstack, Tomcat, Traffic Control, and more. We’re looking for talks
in a wide variety of categories -- anything related to ASF projects and
the Apache development process. The CFP closes at midnight on May 26th.
In addition, the ASF will be celebrating its 20th Anniversary during the
event. For more details and to submit a proposal for the CFP, visit
https://apachecon.com/acna19/ . Registration will be opening soon.




Call for Presentations (CFP) is now open for ApacheCon North America in Las Vegas, September 9-13th

2019-03-12 Thread Felix Cheung
See the Big Data track or the Machine Learning track!

We’re delighted to announce that the Call for Presentations (CFP) is now
open for ApacheCon North America in Las Vegas, September 9-13th! As the
official conference series of the ASF, ApacheCon North America will
feature over a dozen Apache project summits, including Cassandra,
Cloudstack, Tomcat, Traffic Control, and more. We’re looking for talks
in a wide variety of categories -- anything related to ASF projects and
the Apache development process. The CFP closes at midnight on May 26th.
In addition, the ASF will be celebrating its 20th Anniversary during the
event. For more details and to submit a proposal for the CFP, visit
https://apachecon.com/acna19/ . Registration will be opening soon.




Call for Presentations (CFP) is now open for ApacheCon North America in Las Vegas, September 9-13th

2019-03-12 Thread Felix Cheung
See the Big Data track or the Machine Learning track!

We’re delighted to announce that the Call for Presentations (CFP) is now
open for ApacheCon North America in Las Vegas, September 9-13th! As the
official conference series of the ASF, ApacheCon North America will
feature over a dozen Apache project summits, including Cassandra,
Cloudstack, Tomcat, Traffic Control, and more. We’re looking for talks
in a wide variety of categories -- anything related to ASF projects and
the Apache development process. The CFP closes at midnight on May 26th.
In addition, the ASF will be celebrating its 20th Anniversary during the
event. For more details and to submit a proposal for the CFP, visit
https://apachecon.com/acna19/ . Registration will be opening soon.




Re: Zeppelin in GSOC 2019

2019-03-10 Thread Felix Cheung
Hi Xun,

Thanks for your work - could you change the title of the email, I think you 
will get more attention to your ask to review the design.



From: Xun Liu 
Sent: Sunday, March 10, 2019 12:03 AM
To: Jongyoul Lee; m...@apache.org; Jeff Zhang; Vasiliy Morkovkin
Cc: dev@zeppelin.apache.org
Subject: Re: Zeppelin in GSOC 2019

Hello, everyone,

I have completed the zeppelin workflow system design, please review, you can 
directly modify the document or fill in the comments.

JIRA: https://issues.apache.org/jira/browse/ZEPPELIN-4018 

gdoc: 
https://docs.google.com/document/d/1pQjVifOC1knPBuw3LVvby7GyNDXaeBq1ltRg6x4vDxM/edit#
 


:-)

> 在 2019年3月8日,下午2:10,Jeff Zhang  写道:
>
> Hi Liu,
>
> See this link https://community.apache.org/gsoc.html
>
>
> Xun Liu  于2019年3月8日周五 下午1:58写道:
>
>> Hi, Jongyoul Lee, Морковкин
>>
>> I queried the information about GSOS. Is it still necessary to apply for
>> the zeppelin community first?
>> I don't know much about GSOS. In addition to helping the project, the
>> mentor
>> What other work needs to be done?
>>
>>> 在 2019年3月8日,上午10:01,Xun Liu  写道:
>>>
>>> Hi, Морковкин
>>>
>>> I am very happy to be your mentor for GSOC. :-)
>>> I believe that by completing this work, I can also learn a lot.
>>>
>>> Please watch to https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-4018>
>>>
 在 2019年3月8日,上午12:08,Морковкин, Василий Владимирович <
>> morkovkin...@phystech.edu> 写道:

 Hi! For fun I've sketched a toy-prototype of workflow manager in Scala.
>> It makes it easy to impose dependencies on the execution order of tasks.
>> Check this out: https://scastie.scala-lang.org/aRJGberkQ4CWatyABCOJcQ <
>> https://scastie.scala-lang.org/aRJGberkQ4CWatyABCOJcQ> . It reproduces
>> the flow which is shown in the attached picture.
 Xun Liu, It would be great to clarify whether you agree to be a mentor
>> exactly within GSOC, or without it? :)

 
 Best regards, Basil Morkovkin

 чт, 7 мар. 2019 г. в 11:32, Jeff Zhang > zjf...@gmail.com>>:

 Thanks Liu for taking over this, I will help review the design.

 Xun Liu mailto:neliu...@163.com>> 于2019年3月7日周四
>> 下午4:05写道:
 Hi Vasiliy Morkovkin

 Thank you very much for your willingness to implement this feature of
>> workflow.
 I will work with you with the highest priority.
 I am planning to update the system design documentation for workflow
>> first at https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-4018> <
>> https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-4018>> .
 Please set the Watcher in ZEPPELIN-4018.
 This way you can get notification messages for document updates in a
>> timely manner.

 We can communicate all the questions in the ZEPPELIN-4018 JIRA comments.
 If you need it, you can email me at liuxun...@gmail.com > liuxun...@gmail.com>  liuxun...@gmail.com>> , I will reply you the fastest.
 Do you think this kind of cooperation is OK?


 @moon, @Jeff, @Jongyoul Lee , If interested, Please help us improve our
>> system design. Thanks!

 :-)

> 在 2019年3月7日,上午6:04,Морковкин, Василий Владимирович <
>> morkovkin...@phystech.edu > 写道:
>
> Thank you for such a detailed feedback!
> I am definitely interested to work on the workflow implementation with
>> you Xun Liu! Could you become a mentor in GSOC with this task?
> Some front-end work is not a problem at all.
> I'm ready to work at least 30 hours per week in the summer, while now
>> I'd like to take some smaller tasks to take a closer look at existing
>> codebase and to get familiar with your development workflow. Do you have
>> such tasks on mind?
>
> ср, 6 мар. 2019 г. в 05:23, Xun Liu > neliu...@163.com> >>:
> Hi Vasiliy Morkovkin
>
> I said my thoughts on workflow,
>> https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-4018> <
>> https://issues.apache.org/jira/browse/ZEPPELIN-4018 <
>> https://issues.apache.org/jira/browse/ZEPPELIN-4018>>
>
> Because there are more than 20 interpreters in zeppelin,
> Data analysts can be used to do a variety of data development,
> A lot of data development is interdependent. For example,
> the development of machine learning algorithms requires relying on
>> spark to preprocess data, and so on.
>
> Now open source workflow software has Azkaban, airflow,
> Azkaban is relatively simple and has been used to meet most scenarios,
>> and our company 

Re: Do we have a plan to release the first normal version of IoTDB this month?

2019-03-08 Thread Felix Cheung
Sounds great!


From: Xiangdong Huang 
Sent: Friday, March 8, 2019 12:16:18 AM
To: dev@iotdb.apache.org
Subject: Re: Do we have a plan to release the first normal version of IoTDB 
this month?

That's Great! A binary release version is really needed!

And, if so, at least users can get iotdb-jdbc jars from Maven central
repository...

Best,
---
Xiangdong Huang
School of Software, Tsinghua University

 黄向东
清华大学 软件学院


Christofer Dutz  于2019年3月8日周五 下午3:29写道:

> I think that's a good idea,
>
> I think I might start pre-reviewing the current state ... No need in
> listing findings after a RC.
>
> Chris
>
> Outlook für Android herunterladen
>
> 
> From: Julian Feinauer 
> Sent: Friday, March 8, 2019 8:03:46 AM
> To: dev@iotdb.apache.org
> Subject: AW: Do we have a plan to release the first normal version of
> IoTDB this month?
>
> Hi,
>
> I think a (first apache) release is a good idea. Especially to allow
> people to play around with the artifacts.
>
> After what I have seen the code quality is good and the main functionality
> works well.
>
> Julian
>
>
>  Ursprüngliche Nachricht 
> Betreff: Re: Do we have a plan to release the first normal version of
> IoTDB this month?
> Von: 吴晟 Sheng Wu
> An: dev ,dev
> Cc:
>
> I think we should try to do a release. At least a preview version. 3
> months are not short time for a new project.
> Of source, no rush, just when you think it is ready and make sense.
>
>
>
> Sheng Wu
> Apache SkyWalking, ShardingSphere, Zipkin
>
> From Wu Sheng 's phone.
>
>
> -- Original --
> From: yi xu 
> Date: Fri,Mar 8,2019 11:09 AM
> To: dev@iotdb.apache.org 
> Subject: Re: Do we have a plan to release the first normal version of
> IoTDB this month?
>
>
>
> Hi,
>
> Over the last three months, we have improved IoTDB in several ways, such
> as code quality, document and read/write performance. So should we release
> a normal version to our users since we don’t have a normal version right
> now.
>
> Thanks
> XuYi
>


Re: [VOTE] Release Apache Spark 2.4.1 (RC2)

2019-03-07 Thread Felix Cheung
There is SPARK-26604 we are looking into


From: Saisai Shao 
Sent: Wednesday, March 6, 2019 6:05 PM
To: shane knapp
Cc: Stavros Kontopoulos; Sean Owen; DB Tsai; Spark dev list; d_t...@apple.com
Subject: Re: [VOTE] Release Apache Spark 2.4.1 (RC2)

Do we have other block/critical issues for Spark 2.4.1 or waiting something to 
be fixed? I roughly searched the JIRA, seems there's no block/critical issues 
marked for 2.4.1.

Thanks
Saisai

shane knapp mailto:skn...@berkeley.edu>> 于2019年3月7日周四 
上午4:57写道:
i'll be popping in to the sig-big-data meeting on the 20th to talk about stuff 
like this.

On Wed, Mar 6, 2019 at 12:40 PM Stavros Kontopoulos 
mailto:stavros.kontopou...@lightbend.com>> 
wrote:
Yes its a touch decision and as we discussed today 
(https://docs.google.com/document/d/1pnF38NF6N5eM8DlK088XUW85Vms4V2uTsGZvSp8MNIA)
"Kubernetes support window is 9 months, Spark is two years".So we may end up 
with old client versions on branches still supported like 2.4.x in the future.
That gives us no choice but to upgrade, if we want to be on the safe side. We 
have tested 3.0.0 with 1.11 internally and it works but I dont know what it 
means to run with old
clients.


On Wed, Mar 6, 2019 at 7:54 PM Sean Owen 
mailto:sro...@gmail.com>> wrote:
If the old client is basically unusable with the versions of K8S
people mostly use now, and the new client still works with older
versions, I could see including this in 2.4.1.

Looking at https://github.com/fabric8io/kubernetes-client#compatibility-matrix
it seems like the 4.1.1 client is needed for 1.10 and above. However
it no longer supports 1.7 and below.
We have 3.0.x, and versions through 4.0.x of the client support the
same K8S versions, so no real middle ground here.

1.7.0 came out June 2017, it seems. 1.10 was March 2018. Minor release
branches are maintained for 9 months per
https://kubernetes.io/docs/setup/version-skew-policy/

Spark 2.4.0 came in Nov 2018. I suppose we could say it should have
used the newer client from the start as at that point (?) 1.7 and
earlier were already at least 7 months past EOL.
If we update the client in 2.4.1, versions of K8S as recently
'supported' as a year ago won't work anymore. I'm guessing there are
still 1.7 users out there? That wasn't that long ago but if the
project and users generally move fast, maybe not.

Normally I'd say, that's what the next minor release of Spark is for;
update if you want later infra. But there is no Spark 2.5.
I presume downstream distros could modify the dependency easily (?) if
needed and maybe already do. It wouldn't necessarily help end users.

Does the 3.0.x client not work at all with 1.10+ or just unsupported.
If it 'basically works but no guarantees' I'd favor not updating. If
it doesn't work at all, hm. That's tough. I think I'd favor updating
the client but think it's a tough call both ways.



On Wed, Mar 6, 2019 at 11:14 AM Stavros Kontopoulos
mailto:stavros.kontopou...@lightbend.com>> 
wrote:
>
> Yes Shane Knapp has done the work for that already,  and also tests pass, I 
> am working on a PR now, I could submit it for the 2.4 branch .
> I understand that this is a major dependency update, but the problem I see is 
> that the client version is so old that I dont think it makes
> much sense for current users who are on k8s 1.10, 1.11 
> etc(https://github.com/fabric8io/kubernetes-client#compatibility-matrix, 
> 3.0.0 does not even exist in there).
> I dont know what it means to use that old version with current k8s clusters 
> in terms of bugs etc.




--
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


[jira] [Commented] (SPARK-26604) Register channel for stream request

2019-03-06 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786397#comment-16786397
 ] 

Felix Cheung commented on SPARK-26604:
--

could we backport this to branch-2.4?

> Register channel for stream request
> ---
>
> Key: SPARK-26604
> URL: https://issues.apache.org/jira/browse/SPARK-26604
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Liang-Chi Hsieh
>Assignee: Liang-Chi Hsieh
>Priority: Minor
> Fix For: 3.0.0
>
>
> Now in {{TransportRequestHandler.processStreamRequest}}, when a stream 
> request is processed, the stream id is not registered with the current 
> channel in stream manager. It should do that so in case of that the channel 
> gets terminated we can remove associated streams from stream requests too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-26918) All .md should have ASF license header

2019-03-04 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784109#comment-16784109
 ] 

Felix Cheung edited comment on SPARK-26918 at 3/5/19 5:47 AM:
--

[~rmsm...@gmail.com] - you don't need to checkout a tag (or a release) - just 
checkout master into a local branch to test 


was (Author: felixcheung):
[~rmsm...@gmail.com] - you don't need to checkout a tag - just checkout master 
into a local branch to test 

> All .md should have ASF license header
> --
>
> Key: SPARK-26918
> URL: https://issues.apache.org/jira/browse/SPARK-26918
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Felix Cheung
>Priority: Minor
>
> per policy, all md files should have the header, like eg. 
> [https://raw.githubusercontent.com/apache/arrow/master/docs/README.md]
>  or
> [https://raw.githubusercontent.com/apache/hadoop/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/filesystem.md]
>  
> currently it does not
> [https://raw.githubusercontent.com/apache/spark/master/docs/sql-reference.md] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26918) All .md should have ASF license header

2019-03-04 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784109#comment-16784109
 ] 

Felix Cheung commented on SPARK-26918:
--

[~rmsm...@gmail.com] - you don't need to checkout a tag - just checkout master 
into a local branch to test 

> All .md should have ASF license header
> --
>
> Key: SPARK-26918
> URL: https://issues.apache.org/jira/browse/SPARK-26918
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Felix Cheung
>Priority: Minor
>
> per policy, all md files should have the header, like eg. 
> [https://raw.githubusercontent.com/apache/arrow/master/docs/README.md]
>  or
> [https://raw.githubusercontent.com/apache/hadoop/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/filesystem.md]
>  
> currently it does not
> [https://raw.githubusercontent.com/apache/spark/master/docs/sql-reference.md] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-03 Thread Felix Cheung
Once again, I’d have to agree with Sean.

Let’s table the meaning of SPIP for another time, say. I think a few of us are 
trying to understand what does “accelerator resource aware” mean. As far as I 
know, no one is discussing API here. But on google doc, JIRA and on email and 
off list, I have seen questions, questions that are greatly concerning, like 
“oh scheduler is allocating GPU, but how does it affect memory” and many more, 
and so I think finer “high level” goals should be defined.





From: Sean Owen 
Sent: Sunday, March 3, 2019 5:24 PM
To: Xiangrui Meng
Cc: Felix Cheung; Xingbo Jiang; Yinan Li; dev; Weichen Xu; Marco Gaido
Subject: Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

I think treating SPIPs as this high-level takes away much of the point
of VOTEing on them. I'm not sure that's even what Reynold is
suggesting elsewhere; we're nowhere near discussing APIs here, just
what 'accelerator aware' even generally means. If the scope isn't
specified, what are we trying to bind with a formal VOTE? The worst I
can say is that this doesn't mean much, so the outcome of the vote
doesn't matter. The general ideas seems fine to me and I support
_something_ like this.

I think the subtext concern is that SPIPs become a way to request
cover to make a bunch of decisions separately, later. This is, to some
extent, how it has to work. A small number of interested parties need
to decide the details coherently, not design the whole thing by
committee, with occasional check-ins for feedback. There's a balance
between that, and using the SPIP as a license to go finish a design
and proclaim it later. That's not anyone's bad-faith intention, just
the risk of deferring so much.

Mesos support is not a big deal by itself but a fine illustration of
the point. That seems like a fine question of scope now, even if the
'how' or some of the 'what' can be decided later. I raised an eyebrow
here at the reply that this was already judged out-of-scope: how much
are we on the same page about this being a point to consider feedback?

If one wants to VOTE on more details, then this vote just doesn't
matter much. Is a future step to VOTE on some more detailed design
doc? Then that's what I call a "SPIP" and it's practically just
semantics.


On Sun, Mar 3, 2019 at 6:51 PM Xiangrui Meng  wrote:
>
> Hi Felix,
>
> Just to clarify, we are voting on the SPIP, not the companion scoping doc. 
> What is proposed and what we are voting on is to make Spark 
> accelerator-aware. The companion scoping doc and the design sketch are to 
> help demonstrate that what features could be implemented based on the use 
> cases and dev resources the co-authors are aware of. The exact scoping and 
> design would require more community involvement, by no means we are 
> finalizing it in this vote thread.
>
> I think copying the goals and non-goals from the companion scoping doc to the 
> SPIP caused the confusion. As mentioned in the SPIP, we proposed to make two 
> major changes at high level:
>
> At cluster manager level, we update or upgrade cluster managers to include 
> GPU support. Then we expose user interfaces for Spark to request GPUs from 
> them.
> Within Spark, we update its scheduler to understand available GPUs allocated 
> to executors, user task requests, and assign GPUs to tasks properly.
>
> We should keep our vote discussion at this level. It doesn't exclude 
> Mesos/Windows/TPU/FPGA, nor it commits to support YARN/K8s. Through the 
> initial scoping work, we found that we certainly need domain experts to 
> discuss the support of each cluster manager and each accelerator type. But 
> adding more details on Mesos or FPGA doesn't change the SPIP at high level. 
> So we concluded the initial scoping, shared the docs, and started this vote.


Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-03 Thread Felix Cheung
Great points Sean.

Here’s what I’d like to suggest to move forward.
Split the SPIP.

If we want to propose upfront homogeneous allocation (aka spark.task.gpus), 
this should be one on its own and for instance, I really agree with Sean (like 
I did in the discuss thread) that we can’t simply non-goal Mesos. We have 
enough maintenance issue as it is. And IIRC there was a PR proposed for K8S 
that I’d like to see bring that discussion here as well.

IMO upfront allocation is less useful. Specifically too expensive for large 
jobs.

If we want per-stage resource request, this should a full SPIP with a lot more 
details to be hashed out. Our work with Horovod brings a few specific and 
critical requirements on how this should work with distributed DL and I would 
like to see those addressed.

In any case I’d like to see more consensus before moving forward, until then 
I’m going to -1 this.




From: Sean Owen 
Sent: Sunday, March 3, 2019 8:15 AM
To: Felix Cheung
Cc: Xingbo Jiang; Yinan Li; dev; Weichen Xu; Marco Gaido
Subject: Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

I'm for this in general, at least a +0. I do think this has to have a
story for what to do with the existing Mesos GPU support, which sounds
entirely like the spark.task.gpus config here. Maybe it's just a
synonym? that kind of thing.

Requesting different types of GPUs might be a bridge too far, but,
that's a P2 detail that can be hashed out later. (For example, if a
v100 is available and k80 was requested, do you use it or fail? is the
right level of resource control GPU RAM and cores?)

The per-stage resource requirements sounds like the biggest change;
you can even change CPU cores requested per pandas UDF? and what about
memory then? We'll see how that shakes out. That's the only thing I'm
kind of unsure about in this proposal.

On Sat, Mar 2, 2019 at 9:35 PM Felix Cheung  wrote:
>
> I’m very hesitant with this.
>
> I don’t want to vote -1, because I personally think it’s important to do, but 
> I’d like to see more discussion points addressed and not voting completely on 
> the spirit of it.
>
> First, SPIP doesn’t match the format of SPIP proposed and agreed on. (Maybe 
> this is a minor point and perhaps we should also vote to update the SPIP 
> format)
>
> Second, there are multiple pdf/google doc and JIRA. And I think for example 
> the design sketch is not covering the same points as the updated SPIP doc? It 
> would help to make them align before moving forward.
>
> Third, the proposal touches on some fairly core and sensitive components, 
> like the scheduler, and I think more discussions are necessary. We have a few 
> comments there and in the JIRA.
>
>
>
> 
> From: Marco Gaido 
> Sent: Saturday, March 2, 2019 4:18 AM
> To: Weichen Xu
> Cc: Yinan Li; Tom Graves; dev; Xingbo Jiang
> Subject: Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling
>
> +1, a critical feature for AI/DL!
>
> Il giorno sab 2 mar 2019 alle ore 05:14 Weichen Xu 
>  ha scritto:
>>
>> +1, nice feature!
>>
>> On Sat, Mar 2, 2019 at 6:11 AM Yinan Li  wrote:
>>>
>>> +1
>>>
>>> On Fri, Mar 1, 2019 at 12:37 PM Tom Graves  
>>> wrote:
>>>>
>>>> +1 for the SPIP.
>>>>
>>>> Tom
>>>>
>>>> On Friday, March 1, 2019, 8:14:43 AM CST, Xingbo Jiang 
>>>>  wrote:
>>>>
>>>>
>>>> Hi all,
>>>>
>>>> I want to call for a vote of SPARK-24615. It improves Spark by making it 
>>>> aware of GPUs exposed by cluster managers, and hence Spark can match GPU 
>>>> resources with user task requests properly. The proposal and production 
>>>> doc was made available on dev@ to collect input. Your can also find a 
>>>> design sketch at SPARK-27005.
>>>>
>>>> The vote will be up for the next 72 hours. Please reply with your vote:
>>>>
>>>> +1: Yeah, let's go forward and implement the SPIP.
>>>> +0: Don't really care.
>>>> -1: I don't think this is a good idea because of the following technical 
>>>> reasons.
>>>>
>>>> Thank you!
>>>>
>>>> Xingbo


Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

2019-03-02 Thread Felix Cheung
I’m very hesitant with this.

I don’t want to vote -1, because I personally think it’s important to do, but 
I’d like to see more discussion points addressed and not voting completely on 
the spirit of it.

First, SPIP doesn’t match the format of SPIP proposed and agreed on. (Maybe 
this is a minor point and perhaps we should also vote to update the SPIP format)

Second, there are multiple pdf/google doc and JIRA. And I think for example the 
design sketch is not covering the same points as the updated SPIP doc? It would 
help to make them align before moving forward.

Third, the proposal touches on some fairly core and sensitive components, like 
the scheduler, and I think more discussions are necessary. We have a few 
comments there and in the JIRA.




From: Marco Gaido 
Sent: Saturday, March 2, 2019 4:18 AM
To: Weichen Xu
Cc: Yinan Li; Tom Graves; dev; Xingbo Jiang
Subject: Re: [VOTE] [SPARK-24615] SPIP: Accelerator-aware Scheduling

+1, a critical feature for AI/DL!

Il giorno sab 2 mar 2019 alle ore 05:14 Weichen Xu 
mailto:weichen...@databricks.com>> ha scritto:
+1, nice feature!

On Sat, Mar 2, 2019 at 6:11 AM Yinan Li 
mailto:liyinan...@gmail.com>> wrote:
+1

On Fri, Mar 1, 2019 at 12:37 PM Tom Graves  wrote:
+1 for the SPIP.

Tom

On Friday, March 1, 2019, 8:14:43 AM CST, Xingbo Jiang 
mailto:jiangxb1...@gmail.com>> wrote:


Hi all,

I want to call for a vote of 
SPARK-24615. It improves 
Spark by making it aware of GPUs exposed by cluster managers, and hence Spark 
can match GPU resources with user task requests properly. The 
proposal
 and production 
doc
 was made available on dev@ to collect input. Your can also find a design 
sketch at SPARK-27005.

The vote will be up for the next 72 hours. Please reply with your vote:

+1: Yeah, let's go forward and implement the SPIP.
+0: Don't really care.
-1: I don't think this is a good idea because of the following technical 
reasons.

Thank you!

Xingbo


[jira] [Created] (ZEPPELIN-4026) Doc should warn about anonymous access

2019-03-02 Thread Felix Cheung (JIRA)
Felix Cheung created ZEPPELIN-4026:
--

 Summary: Doc should warn about anonymous access
 Key: ZEPPELIN-4026
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-4026
 Project: Zeppelin
  Issue Type: Bug
Reporter: Felix Cheung






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (SPARK-26918) All .md should have ASF license header

2019-03-02 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16782565#comment-16782565
 ] 

Felix Cheung commented on SPARK-26918:
--

[~srowen] what do you think?

> All .md should have ASF license header
> --
>
> Key: SPARK-26918
> URL: https://issues.apache.org/jira/browse/SPARK-26918
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Felix Cheung
>Priority: Minor
>
> per policy, all md files should have the header, like eg. 
> [https://raw.githubusercontent.com/apache/arrow/master/docs/README.md]
>  or
> [https://raw.githubusercontent.com/apache/hadoop/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/filesystem.md]
>  
> currently it does not
> [https://raw.githubusercontent.com/apache/spark/master/docs/sql-reference.md] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26918) All .md should have ASF license header

2019-03-02 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16782564#comment-16782564
 ] 

Felix Cheung commented on SPARK-26918:
--

I'm for doing this (Reopen this issue)

Also [~rmsm...@gmail.com] this needs to be 
 # on all .md file
 # remove rat filter for .md then after that
 # run doc build to check the doc is generated properly

ie. at the beginning of the section "Update the Spark Website" 
https://spark.apache.org/release-process.html

{{$ cd docs }}

{{$ PRODUCTION=1 jekyll build }}

> All .md should have ASF license header
> --
>
> Key: SPARK-26918
> URL: https://issues.apache.org/jira/browse/SPARK-26918
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Felix Cheung
>Priority: Minor
>
> per policy, all md files should have the header, like eg. 
> [https://raw.githubusercontent.com/apache/arrow/master/docs/README.md]
>  or
> [https://raw.githubusercontent.com/apache/hadoop/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/filesystem.md]
>  
> currently it does not
> [https://raw.githubusercontent.com/apache/spark/master/docs/sql-reference.md] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: SPIP: Accelerator-aware Scheduling

2019-03-02 Thread Felix Cheung
+1 on mesos - what Sean says


From: Andrew Melo 
Sent: Friday, March 1, 2019 9:19 AM
To: Xingbo Jiang
Cc: Sean Owen; Xiangrui Meng; dev
Subject: Re: SPIP: Accelerator-aware Scheduling

Hi,

On Fri, Mar 1, 2019 at 9:48 AM Xingbo Jiang  wrote:
>
> Hi Sean,
>
> To support GPU scheduling with YARN cluster, we have to update the hadoop 
> version to 3.1.2+. However, if we decide to not upgrade hadoop to beyond that 
> version for Spark 3.0, then we just have to disable/fallback the GPU 
> scheduling with YARN, users shall still be able to have that feature with 
> Standalone or Kubernetes cluster.
>
> We didn't include the Mesos support in current SPIP because we didn't receive 
> use cases that require GPU scheduling on Mesos cluster, however, we can still 
> add Mesos support in the future if we observe valid use cases.

First time caller, long time listener. We have GPUs in our Mesos-based
Spark cluster, and it would be nice to use them with Spark-based
GPU-enabled frameworks (our use case is deep learning applications).

Cheers
Andrew

>
> Thanks!
>
> Xingbo
>
> Sean Owen  于2019年3月1日周五 下午10:39写道:
>>
>> Two late breaking questions:
>>
>> This basically requires Hadoop 3.1 for YARN support?
>> Mesos support is listed as a non goal but it already has support for 
>> requesting GPUs in Spark. That would be 'harmonized' with this 
>> implementation even if it's not extended?
>>
>> On Fri, Mar 1, 2019, 7:48 AM Xingbo Jiang  wrote:
>>>
>>> I think we are aligned on the commitment, I'll start a vote thread for this 
>>> shortly.
>>>
>>> Xiangrui Meng  于2019年2月27日周三 上午6:47写道:

 In case there are issues visiting Google doc, I attached PDF files to the 
 JIRA.

 On Tue, Feb 26, 2019 at 7:41 AM Xingbo Jiang  wrote:
>
> Hi all,
>
> I want send a revised SPIP on implementing Accelerator(GPU)-aware 
> Scheduling. It improves Spark by making it aware of GPUs exposed by 
> cluster managers, and hence Spark can match GPU resources with user task 
> requests properly. If you have scenarios that need to run 
> workloads(DL/ML/Signal Processing etc.) on Spark cluster with GPU nodes, 
> please help review and check how it fits into your use cases. Your 
> feedback would be greatly appreciated!
>
> # Links to SPIP and Product doc:
>
> * Jira issue for the SPIP: 
> https://issues.apache.org/jira/browse/SPARK-24615
> * Google Doc: 
> https://docs.google.com/document/d/1C4J_BPOcSCJc58HL7JfHtIzHrjU0rLRdQM3y7ejil64/edit?usp=sharing
> * Product Doc: 
> https://docs.google.com/document/d/12JjloksHCdslMXhdVZ3xY5l1Nde3HRhIrqvzGnK_bNE/edit?usp=sharing
>
> Thank you!
>
> Xingbo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Release Apache Pinot (incubating) 0.1.0 RC0

2019-03-02 Thread Felix Cheung
+1 (binding)

Carrying my vote from dev@

checked license headers
compiled from source, ran tests, demo
checked name includes incubating
checked DISCLAIMER, LICENSE and NOTICE
checked signature and hashes
checked no unexpected binary files


On Thu, Feb 28, 2019 at 10:45 AM Seunghyun Lee  wrote:

> Hi all,
>
> This is a call for vote to the release Apache Pinot (incubating) version
> 0.1.0.
>
> Apache Pinot (incubating) is a distributed columnar storage engine that can
> ingest
> data in realtime and serve analytical queries at low latency.
>
> Pinot community has voted and approved this release.
>
> Vote thread:
>
> https://lists.apache.org/thread.html/f136d3eaa9dfbab6e17b262a5542813099f2b128465d9d17ef69defd@%3Cdev.pinot.apache.org%3E
>
> Result thread:
>
> https://lists.apache.org/thread.html/ce58034678349f82afc5f8ed2edd435875301183554f964778dffb7a@%3Cdev.pinot.apache.org%3E
>
> The release candidate:
>
> https://dist.apache.org/repos/dist/dev/incubator/pinot/apache-pinot-incubating-0.1.0-rc0
>
> Git tag for this release:
> https://github.com/apache/incubator-pinot/tree/release-0.1.0-rc0
>
> Git hash for this release:
> bbf29dc6e0f23383948f0db66565ebbdf383dd0d
>
> The artifacts have been signed with key: 44BA03AD164D961B, which can be
> found in the following KEYS file.
> https://dist.apache.org/repos/dist/release/incubator/pinot/KEYS
>
> Release notes:
> https://github.com/apache/incubator-pinot/releases/tag/release-0.1.0-rc0
>
> Staging repository:
> https://repository.apache.org/content/repositories/orgapachepinot-1002
>
> Documentation on verifying a release candidate:
>
> https://cwiki.apache.org/confluence/display/PINOT/Validating+a+release+candidate
>
>
> The vote will be open for at least 72 hours or until necessary number of
> votes are reached.
>
> Please vote accordingly,
>
> [ ] +1 approve
> [ ] +0 no opinion
> [ ] -1 disapprove with the reason
>
> Thanks,
> Apache Pinot (incubating) team
>


Re: Help with permissions to cwiki pages

2019-03-02 Thread Felix Cheung
I saw the other comment and the INFRA JIRA.


My understanding would be that you would need to ensure the user has
permission in the “space” first.



On Fri, Mar 1, 2019 at 2:48 PM Felix Cheung  wrote:

> I missed the part about design doc
>
> My suggestion for that is to open JIRA and link to pdf or google doc on
> that.
>
>
> On Fri, Mar 1, 2019 at 2:44 PM Felix Cheung 
> wrote:
>
>> I’ve mentioned this on the pinot site PR. My suggestion is not to use
>> wiki for this.
>>
>> Use the site and anyone in the community can create new PR to update.
>>
>>
>> On Fri, Mar 1, 2019 at 1:58 PM Subbu Subramaniam <
>> ssubraman...@linkedin.com> wrote:
>>
>>> Hi Felix,
>>>
>>> In the Pinot project we decided to use cwiki for design documents. We
>>> need to collaborate across companies on these, so we are trying to give
>>> access to specific pages to engineers in the community who are working on
>>> that feature.
>>>
>>> But then they are not able to edit the pages even though we have given
>>> them permission (See https://issues.apache.org/jira/browse/INFRA-17912).
>>> Can you help us with this? Is this a valid usage pattern?
>>>
>>> -Subbu
>>>
>>


Re: Help with permissions to cwiki pages

2019-03-01 Thread Felix Cheung
I missed the part about design doc

My suggestion for that is to open JIRA and link to pdf or google doc on
that.


On Fri, Mar 1, 2019 at 2:44 PM Felix Cheung  wrote:

> I’ve mentioned this on the pinot site PR. My suggestion is not to use wiki
> for this.
>
> Use the site and anyone in the community can create new PR to update.
>
>
> On Fri, Mar 1, 2019 at 1:58 PM Subbu Subramaniam <
> ssubraman...@linkedin.com> wrote:
>
>> Hi Felix,
>>
>> In the Pinot project we decided to use cwiki for design documents. We
>> need to collaborate across companies on these, so we are trying to give
>> access to specific pages to engineers in the community who are working on
>> that feature.
>>
>> But then they are not able to edit the pages even though we have given
>> them permission (See https://issues.apache.org/jira/browse/INFRA-17912).
>> Can you help us with this? Is this a valid usage pattern?
>>
>> -Subbu
>>
>


Re: Windows Supports

2019-02-28 Thread Felix Cheung
Ok but was the point about appveyer as CI. It’s not hard to setup.



From: Jongyoul Lee 
Sent: Tuesday, February 26, 2019 11:12 PM
To: users
Subject: Re: Windows Supports

@Felix
What I meant was the case that running Zeppelin in the Windows environment 
natively without a docker and a virtual Linux environment. People could run 
Zeppelin through these kinds of ways but in the case where they run Zeppelin 
natively, we didn't test that case and couldn't know the potential problems as 
well. We could guide to use a docker container or virtual Linux environment by 
default for Windows users instead of using native scripts.

@Jeff,
I also thought of running interpreters in Windows. I don't think it's easy to 
set up CI for windows.

Basically, I agree with that the best way is to support Windows well. But the 
more important thing is to keep Window users' UX. WDYT?

On Wed, Feb 27, 2019 at 12:53 AM Jeff Zhang 
mailto:zjf...@gmail.com>> wrote:
I think the issue is about running spark interpreter in windows. This is due to 
some script changes in interpreter launch script interpreter.sh, but it is not 
applied in interpreter.cmd. We could still support windows by fixing this 
issue, but I don't have time on this right now. I would be very appreciated if 
someone else can help on this, and also set up CI in appveyer

Thomas Bernhardt mailto:bernhardt...@yahoo.com>> 
于2019年2月26日周二 下午8:12写道:
We had no trouble running 0.8.0 on Windows 10 professional. We even set up 
authentication. Maybe our case is special however since we don't use any of the 
provided interpreters and only have an own interpreter.
-Tom

On Monday, February 25, 2019, 9:29:14 PM EST, Jongyoul Lee 
mailto:jongy...@gmail.com>> wrote:


Hi Dev and Users,

Recently, personally, I've got reports that Z couldn't run under Windows' 
environments.

I think we need to discuss how to handle issues supporting windows.

AFAIK, there are not many resources to test Z under Windows by committers or 
contributors. If we couldn't support Windows well, how about removing bin/*.cmd 
and focusing on alternatives like dockers.

WDYT?

JL

--
이종열, Jongyoul Lee, 李宗烈
http://madeng.net


--
Best Regards

Jeff Zhang


--
이종열, Jongyoul Lee, 李宗烈
http://madeng.net


Re: [VOTE] Accept Apache TVM into the incubator

2019-02-28 Thread Felix Cheung
+1 (binding)

On Wed, Feb 27, 2019 at 10:17 PM kellen sunderland <
kellen.sunderl...@gmail.com> wrote:

> +1 (non-binding)
>
> On Wed, Feb 27, 2019, 10:15 PM Furkan KAMACI 
> wrote:
>
> > +1 (binding)
> >
> > 28 Şub 2019 Per, saat 08:40 tarihinde Henry Saputra <
> > henry.sapu...@gmail.com>
> > şunu yazdı:
> >
> > > +1 (binding)
> > >
> > > On Wed, Feb 27, 2019 at 8:44 PM Markus Weimer 
> wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > we've discussed the proposal for the TVM project in [1]. The proposal
> > > > itself can
> > > > be found on the wiki [2].
> > > >
> > > > According to the Incubator rules[3] I'd like to call a vote to accept
> > the
> > > > new
> > > > TVM project as a podling in the Apache Incubator.
> > > >
> > > > A vote for accepting a new Apache Incubator podling is a majority
> vote.
> > > > Everyone
> > > > is welcome to vote, only Incubator PMC member votes are binding. It
> > would
> > > > be
> > > > helpful (but not required) if you could add a comment stating whether
> > > your
> > > > vote
> > > > is binding or non-binding.
> > > >
> > > > This vote will run for at least 72 hours (but I expect to keep it
> open
> > > for
> > > > longer). Please VOTE as follows:
> > > >
> > > > [ ] +1 Accept TVM into the Apache Incubator
> > > > [ ] +0 Abstain
> > > > [ ] -1 Do not accept TVM into the Apache Incubator because ...
> > > >
> > > > Thank you for everyone who decided to join in in the past
> discussions!
> > > >
> > > > Markus
> > > >
> > > > [1]:
> > > >
> > >
> >
> https://lists.apache.org/thread.html/e2b1fe9ca76422ec80b146a6b120091f2419e2f1c27d57080f39cf6f@%3Cgeneral.incubator.apache.org%3E
> > > >
> > > > [2]: https://wiki.apache.org/incubator/TVMProposal
> > > >
> > > > [3]: https://incubator.apache.org/guides/proposal.html#the_vote
> > > >
> > > > -
> > > > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > > > For additional commands, e-mail: general-h...@incubator.apache.org
> > > >
> > > >
> > >
> >
>


Re: Pinot Incubator PMC report for March 2019 has been updated

2019-02-25 Thread Felix Cheung
LG. Thx for putting this together.

About “updated LICENSE/NOTICE” - wasn’t that in the last month ie the timeframe 
for the last report?


From: Seunghyun Lee 
Sent: Monday, February 25, 2019 9:09 PM
To: dev@pinot.apache.org
Subject: Pinot Incubator PMC report for March 2019 has been updated

Hi all,

I have updated the incubator PMC report for Pinot for March 2019.
https://wiki.apache.org/incubator/March2019

Can someone go over the report and provide me a feedback?

Best,
Seunghyun


Re: Windows Supports

2019-02-25 Thread Felix Cheung
Testing on windows can also be done as CI on appveyer.

I don’t completely get your comment on .cmd file though. Are you suggesting we 
don’t support windows and users can “run on windows” by basically running Linux 
in a virtual environment? Docker is one and there is Linux on Windows 
https://docs.microsoft.com/en-us/windows/wsl/install-win10



From: Jongyoul Lee 
Sent: Monday, February 25, 2019 6:29 PM
To: dev; users
Subject: Windows Supports

Hi Dev and Users,

Recently, personally, I've got reports that Z couldn't run under Windows'
environments.

I think we need to discuss how to handle issues supporting windows.

AFAIK, there are not many resources to test Z under Windows by committers
or contributors. If we couldn't support Windows well, how about removing
bin/*.cmd and focusing on alternatives like dockers.

WDYT?

JL

--
이종열, Jongyoul Lee, 李宗烈
http://madeng.net


Re: Windows Supports

2019-02-25 Thread Felix Cheung
Testing on windows can also be done as CI on appveyer.

I don’t completely get your comment on .cmd file though. Are you suggesting we 
don’t support windows and users can “run on windows” by basically running Linux 
in a virtual environment? Docker is one and there is Linux on Windows 
https://docs.microsoft.com/en-us/windows/wsl/install-win10



From: Jongyoul Lee 
Sent: Monday, February 25, 2019 6:29 PM
To: dev; users
Subject: Windows Supports

Hi Dev and Users,

Recently, personally, I've got reports that Z couldn't run under Windows'
environments.

I think we need to discuss how to handle issues supporting windows.

AFAIK, there are not many resources to test Z under Windows by committers
or contributors. If we couldn't support Windows well, how about removing
bin/*.cmd and focusing on alternatives like dockers.

WDYT?

JL

--
이종열, Jongyoul Lee, 李宗烈
http://madeng.net


Re: [DISCUSS][SQL][PySpark] Column name support for SQL functions

2019-02-24 Thread Felix Cheung
I hear three topics in this thread

1. I don’t think we should remove string. Column and string can both be “type 
safe”. And I would agree we don’t *need* to break API compatibility here.

2. Gaps in python API. Extending on #1, definitely we should be consistent and 
add string as param where it is missed.

3. Scala API for string - hard to say but make sense if nothing but for 
consistency. Though I can also see the argument of Column only in Scala. String 
might be more natural in python and much less significant in Scala because of 
$”foo” notation.

(My 2 c)



From: Sean Owen 
Sent: Sunday, February 24, 2019 6:59 AM
To: André Mello
Cc: dev
Subject: Re: [DISCUSS][SQL][PySpark] Column name support for SQL functions

I just commented on the PR -- I personally don't think it's worth
removing support for, say, max("foo") over max(col("foo")) or
max($"foo") in Scala. We can make breaking changes in Spark 3 but this
seems like it would unnecessarily break a lot of code. The string arg
is more concise in Python and I can't think of cases where it's
particularly ambiguous or confusing; on the contrary it's more natural
coming from SQL.

What we do have are inconsistencies and errors in support of string vs
Column as fixed in the PR. I was surprised to see that
df.select(abs("col")) throws an error while df.select(sqrt("col"))
doesn't. I think that's easy to fix on the Python side. Really I think
the question is: do we need to add methods like "def abs(String)" and
more in Scala? that would remain inconsistent even if the Pyspark side
is fixed.

On Sun, Feb 24, 2019 at 8:54 AM André Mello  wrote:
>
> # Context
>
> This comes from [SPARK-26979], which became PR #23879 and then PR
> #23882. The following reflects all the findings made so far.
>
> # Description
>
> Currently, in the Scala API, some SQL functions have two overloads,
> one taking a string that names the column to be operated on, the other
> taking a proper Column object. This allows for two patterns of calling
> these functions, which is a source of inconsistency and generates
> confusion for new users, since it is hard to predict which functions
> will take a column name or not.
>
> The PySpark API partially solves this problem by internally converting
> the argument to a Column object prior to passing it through to the
> underlying JVM implementation. This allows for a consistent use of
> name literals across the API, except for a few violations:
>
> - lower()
> - upper()
> - abs()
> - bitwiseNOT()
> - ltrim()
> - rtrim()
> - trim()
> - ascii()
> - base64()
> - unbase64()
>
> These violations happen because for a subset of the SQL functions,
> PySpark uses a functional mechanism (`_create_function`) to directly
> call the underlying JVM equivalent by name, thus skipping the
> conversion step. In most cases the column name pattern still works
> because the Scala API has its own support for string arguments, but
> the aforementioned functions are also exceptions there.
>
> My proposal was to solve this problem by adding the string support
> where it was missing in the PySpark API. Since this is a purely
> additive change, it doesn't break past code. Additionally, I find the
> API sugar to be a positive feature, since code like `max("foo")` is
> more concise and readable than `max(col("foo"))`. It adheres to the
> DRY philosophy and is consistent with Python's preference for
> readability over type protection.
>
> However, upon submission of the PR, a discussion was started about
> whether it wouldn't be better to entirely deprecate string support
> instead - in particular with major release 3.0 in mind. The reasoning,
> as I understood it, was that this approach is more explicit and type
> safe, which is preferred in Java/Scala, plus it reduces the API
> surface area - and the Python API should be consistent with the others
> as well.
>
> Upon request by @HyukjinKwon I'm submitting this matter for discussion
> by this mailing list.
>
> # Summary
>
> There is a problem with inconsistency in the Scala/Python SQL API,
> where sometimes you can use a column name string as a proxy, and
> sometimes you have to use a proper Column object. To solve it there
> are two approaches - to remove the string support entirely, or to add
> it where it is missing. Which approach is best?
>
> Hope this is clear.
>
> -- André.
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Accept Cava into the Apache Incubator

2019-02-24 Thread Felix Cheung
+1

On Sat, Feb 23, 2019 at 11:39 PM Vinayakumar B 
wrote:

> +1
>
> -Vinay
>
>
>
> On Fri, 22 Feb 2019, 7:22 am Huxing Zhang,  wrote:
>
> > +1 (non-binding)
> >
> > On Thu, Feb 21, 2019 at 3:50 AM Antoine Toulme 
> > wrote:
> > >
> > > Hi everyone,
> > >
> > > we've discussed the proposal for the Cava project in [1] and [2]. The
> > > proposal itself can be found on the wiki[3].
> > >
> > > We discussed how to go about finding a suitable name for the project in
> > [2].
> > > I will kick off a vote to pick a name based on the proposals made
> there.
> > >
> > > According to the Incubator rules[4] I'd like to call a vote to accept
> the
> > > new "Cava" project as a podling in the Apache Incubator.
> > >
> > > A vote for accepting a new Apache Incubator podling is a majority vote.
> > > Everyone is welcome to vote, only Incubator PMC member votes are
> binding.
> > > It would be helpful (but not required) if you could add a comment
> stating
> > > whether your vote is binding or non-binding.
> > >
> > > This vote will run for at least 72 hours (but I expect to keep it open
> > for
> > > longer). Please VOTE as follows:
> > >
> > > [ ] +1 Accept Cava into the Apache Incubator
> > > [ ] +0 Abstain
> > > [ ] -1 Do not accept Cava into the Apache Incubator because ...
> > >
> > > Thank you for everyone who decided to join in in the past discussions!
> > > Antoine
> > >
> > > [1]:
> >
> https://lists.apache.org/thread.html/5a7f6a218b11a1cac61fbd53f4c995fd7716f8ad3751cf9f171ebd57@%3Cgeneral.incubator.apache.org%3E
> > > [2]:
> >
> https://lists.apache.org/thread.html/8d8014f53f140a3ccdd517c3c303de1d45cc04afdaee5961ac43e7fc@%3Cgeneral.incubator.apache.org%3E
> > > [3]:
> https://wiki.apache.org/incubator/CavaProposal?action=recall=14
> > > [4]: https://incubator.apache.org/guides/proposal.html#the_vote
> > > -
> > > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > > For additional commands, e-mail: general-h...@incubator.apache.org
> > >
> >
> >
> > --
> > Best Regards!
> > Huxing
> >
> > -
> > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > For additional commands, e-mail: general-h...@incubator.apache.org
> >
> >
>


[jira] [Comment Edited] (SPARK-26918) All .md should have ASF license header

2019-02-23 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16776080#comment-16776080
 ] 

Felix Cheung edited comment on SPARK-26918 at 2/24/19 12:36 AM:


actually, it does - it's a comment in ASF incubator that spark is setting the 
wrong example for all .md files. it's likely better to fix this before more 
direct feedback coming down.

[https://www.apache.org/legal/src-headers.html#headers]

[https://www.apache.org/legal/src-headers.html#faq-docs]


was (Author: felixcheung):
actually, it does - it's a comment in ASF incubator that spark is setting the 
wrong example for all .md files

https://www.apache.org/legal/src-headers.html#headers

https://www.apache.org/legal/src-headers.html#faq-docs

> All .md should have ASF license header
> --
>
> Key: SPARK-26918
> URL: https://issues.apache.org/jira/browse/SPARK-26918
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Felix Cheung
>Priority: Minor
>
> per policy, all md files should have the header, like eg. 
> [https://raw.githubusercontent.com/apache/arrow/master/docs/README.md]
>  or
> [https://raw.githubusercontent.com/apache/hadoop/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/filesystem.md]
>  
> currently it does not
> [https://raw.githubusercontent.com/apache/spark/master/docs/sql-reference.md] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26918) All .md should have ASF license header

2019-02-23 Thread Felix Cheung (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16776080#comment-16776080
 ] 

Felix Cheung commented on SPARK-26918:
--

actually, it does - it's a comment in ASF incubator that spark is setting the 
wrong example for all .md files

https://www.apache.org/legal/src-headers.html#headers

https://www.apache.org/legal/src-headers.html#faq-docs

> All .md should have ASF license header
> --
>
> Key: SPARK-26918
> URL: https://issues.apache.org/jira/browse/SPARK-26918
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Felix Cheung
>Priority: Minor
>
> per policy, all md files should have the header, like eg. 
> [https://raw.githubusercontent.com/apache/arrow/master/docs/README.md]
>  or
> [https://raw.githubusercontent.com/apache/hadoop/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/filesystem.md]
>  
> currently it does not
> [https://raw.githubusercontent.com/apache/spark/master/docs/sql-reference.md] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: [VOTE] Apache Pinot (incubating) 0.1.0 RC0

2019-02-22 Thread Felix Cheung
Hi, what’s the next step on this?



From: Sunitha Beeram 
Sent: Tuesday, February 19, 2019 5:11 PM
To: Seunghyun Lee; dev@pinot.apache.org
Subject: Re: [VOTE] Apache Pinot (incubating) 0.1.0 RC0

+1


Checked out the code on Linux and verified the steps. (Had a few wiki updates 
that I mentioned to Seunghyun Lee offline)


From: Subbu Subramaniam 
Sent: Tuesday, February 19, 2019 4:12:28 PM
To: Seunghyun Lee; dev@pinot.apache.org
Subject: Re: [VOTE] Apache Pinot (incubating) 0.1.0 RC0

+1 All checked out after the key was fixed.

Some suggestions:

* We should add verification for realtime also.
* We should verify from binary distribution rather than the src that has been 
built? (not sure about this)

Otherwise, lgtm

-Subbu

From: Seunghyun Lee 
Sent: Tuesday, February 19, 2019 1:54 PM
To: dev@pinot.apache.org
Subject: Re: [VOTE] Apache Pinot (incubating) 0.1.0 RC0

By the way, there actually was an issue with KEYS file. I fixed that as
well. Thanks for pointing it out :)

Best,
Seunghyun


On Mon, Feb 18, 2019 at 9:43 PM Seunghyun Lee  wrote:

> Hi Felix,
>
> Thanks for the detailed feedback. From the next release, I will make sure
> to use 4096 bits key.
>
> I have modified the wiki page for those 2 points. Regarding importing KEYS
> file, I have tried with my personal laptop and it works fine with it.
>
> Can other people comment on importing KEYS file while validating a release?
>
> Best,
> Seunghyun
>
> On Fri, Feb 15, 2019 at 10:44 PM Felix Cheung 
> wrote:
>
>> +1
>>
>> All checked out - please see note on key and wiki.
>> Thanks for putting this together.
>>
>> Note - ideally, signing key should be 4096 bits
>> https://www.apache.org/dev/release-signing.html#note
>> checked license headers
>> compiled from source, ran tests, demo
>> checked name includes incubating
>> checked DISCLAIMER, LICENSE and NOTICE
>> checked signature and hashes
>> checked no unexpected binary files
>>
>>
>> For some reason I wasn't able to import from KEYS file correctly:
>>
>> $ cat KEYS
>> pub rsa2048 2019-02-01 [SC] [expires: 2021-01-31]
>> FD534854D542FD474278B85344BA03AD164D961B
>> uid [ultimate] Seunghyun Lee 
>> sig 3 44BA03AD164D961B 2019-02-01 Seunghyun Lee 
>> sub rsa2048 2019-02-01 [E] [expires: 2021-01-31]
>> sig 44BA03AD164D961B 2019-02-01 Seunghyun Lee 
>>
>> -BEGIN PGP PUBLIC KEY BLOCK-
>> ...
>> -END PGP PUBLIC KEY BLOCK-
>>
>> $ gpg --import KEYS
>> gpg: key 6E106A1A5681D67E: public key "Seunghyun Lee "
>> imported
>> gpg: Total number processed: 1
>> gpg: imported: 1
>>
>> # Note - the wrong key is imported!
>> # Whereas this works
>>
>> $ gpg --recv-keys FD534854D542FD474278B85344BA03AD164D961B
>> gpg: key 44BA03AD164D961B: public key "Seunghyun Lee "
>> imported
>> gpg: Total number processed: 1
>> gpg: imported: 1
>>
>>
>> Note -
>> about the wiki
>> https://cwiki.apache.org/confluence/display/PINOT/Validating+a+release+candidate
>>
>> 1. Download the release candidate
>> - FYI this can also be done via http,
>> https://dist.apache.org/repos/dist/dev/incubator/pinot/apache-pinot-incubating-0.1.0-rc0/apache-pinot-incubating-0.1.0-src.tar.gz
>>
>> 2. demo
>> cd pinot-distribution/target/apache-pinot-incubating-x.x.x-bin
>>
>> seems like should be
>> cd
>> pinot-distribution/target/apache-pinot-incubating-0.1.0-bin/apache-pinot-incubating-0.1.0-bin
>>
>>
>>
>> 
>> From: Seunghyun Lee 
>> Sent: Thursday, February 14, 2019 10:46 PM
>> To: dev@pinot.apache.org
>> Subject: [VOTE] Apache Pinot (incubating) 0.1.0 RC0
>>
>> Hi Pinot Community,
>>
>> This is a call for vote to the release Apache Pinot (incubating) version
>> 0.1.0.
>>
>> The release candidate:
>>
>> https://dist.apache.org/repos/dist/dev/incubator/pinot/apache-pinot-incubating-0.1.0-rc0
>>
>> Git tag for this release:
>> https://github.com/apache/incubator-pinot/tree/release-0.1.0-rc0
>>
>> Git hash for this release:
>> bbf29dc6e0f23383948f0db66565ebbdf383dd0d
>>
>> The artifacts have been signed with key: 44BA03AD164D961B, which can be
>> found in the following KEYS file.
>> https://dist.apache.org/repos/dist/release/incubator/pinot/KEYS
>>
>> Release notes:
>> https://github.com/apache/incubator-pinot/releases/tag/release-0.1.0-rc0
>>
>> Staging repository:
>> https://repository.apache.org/content/repositories/orgapachepinot-1002
>>
>> Documentation on verifying a release candidate:
>>
>> https://cwiki.apache.org/confluence/display/PINOT/Validating+a+release+candidate
>>
>>
>> The vote will be open for at least 72 hours or until necessary number of
>> votes are reached.
>>
>> Please vote accordingly,
>>
>> [ ] +1 approve
>> [ ] +0 no opinion
>> [ ] -1 disapprove with the reason
>>
>> Thanks,
>> Apache Pinot (incubating) team
>>
>


Re: Spark-hive integration on HDInsight

2019-02-21 Thread Felix Cheung
You should check with HDInsight support


From: Jay Singh 
Sent: Wednesday, February 20, 2019 11:43:23 PM
To: User
Subject: Spark-hive integration on HDInsight

I am trying to integrate  spark with hive on HDInsight  spark cluster .
I copied hive-site.xml in spark/conf directory. In addition I added hive 
metastore properties like jdbc connection info on Ambari as well. But still the 
database and tables created using spark-sql are not visible in hive. Changed 
‘spark.sql.warehouse.dir’ value also to point to hive warehouse directory.
Spark does work with hive not having LLAP ON. What am I missing in the 
configuration to integrate spark with hive ? Any pointer will be appreciated.

thx


Re: Questions about deleting branches on Apache Jenkins

2019-02-21 Thread Felix Cheung
Can you point to where you see these branches? Is it on asf Jenkins?



From: yi xu 
Sent: Thursday, February 21, 2019 12:52 AM
To: dev@iotdb.apache.org
Subject: Questions about deleting branches on Apache Jenkins

Hi,

I’m curious about the branch detection of our Jenkins pipeline. If I create a 
new branch and push it to remote repo, I can find my branch on jenkins 
pipeline. However, if I delete this branch on remote repo, jenkins pipeline 
won’t delete it. Now there are lots of useless branches piled up on jenkins.

Can someone help to delete them?

Thanks
XuYi


Re: [VOTE] Release Apache Spark 2.4.1 (RC2)

2019-02-21 Thread Felix Cheung
I merged the fix to 2.4.



From: Felix Cheung 
Sent: Wednesday, February 20, 2019 9:34 PM
To: DB Tsai; Spark dev list
Cc: Cesar Delgado
Subject: Re: [VOTE] Release Apache Spark 2.4.1 (RC2)

Could you hold for a bit - I have one more fix to get in



From: d_t...@apple.com on behalf of DB Tsai 
Sent: Wednesday, February 20, 2019 12:25 PM
To: Spark dev list
Cc: Cesar Delgado
Subject: Re: [VOTE] Release Apache Spark 2.4.1 (RC2)

Okay. Let's fail rc2, and I'll prepare rc3 with SPARK-26859.

DB Tsai | Siri Open Source Technologies [not a contribution] |  Apple, Inc

> On Feb 20, 2019, at 12:11 PM, Marcelo Vanzin  
> wrote:
>
> Just wanted to point out that
> https://issues.apache.org/jira/browse/SPARK-26859 is not in this RC,
> and is marked as a correctness bug. (The fix is in the 2.4 branch,
> just not in rc2.)
>
> On Wed, Feb 20, 2019 at 12:07 PM DB Tsai  wrote:
>>
>> Please vote on releasing the following candidate as Apache Spark version 
>> 2.4.1.
>>
>> The vote is open until Feb 24 PST and passes if a majority +1 PMC votes are 
>> cast, with
>> a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 2.4.1
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is v2.4.1-rc2 (commit 
>> 229ad524cfd3f74dd7aa5fc9ba841ae223caa960):
>> https://github.com/apache/spark/tree/v2.4.1-rc2
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v2.4.1-rc2-bin/
>>
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1299/
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v2.4.1-rc2-docs/
>>
>> The list of bug fixes going into 2.4.1 can be found at the following URL:
>> https://issues.apache.org/jira/projects/SPARK/versions/2.4.1
>>
>> FAQ
>>
>> =
>> How can I help test this release?
>> =
>>
>> If you are a Spark user, you can help us test this release by taking
>> an existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env and install
>> the current RC and see if anything important breaks, in the Java/Scala
>> you can add the staging repository to your projects resolvers and test
>> with the RC (make sure to clean up the artifact cache before/after so
>> you don't end up building with a out of date RC going forward).
>>
>> ===
>> What should happen to JIRA tickets still targeting 2.4.1?
>> ===
>>
>> The current list of open tickets targeted at 2.4.1 can be found at:
>> https://issues.apache.org/jira/projects/SPARK and search for "Target 
>> Version/s" = 2.4.1
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should
>> be worked on immediately. Everything else please retarget to an
>> appropriate release.
>>
>> ==
>> But my bug isn't fixed?
>> ==
>>
>> In order to make timely releases, we will typically not hold the
>> release unless the bug in question is a regression from the previous
>> release. That being said, if there is something which is a regression
>> that has not been correctly targeted please ping me or a committer to
>> help target the issue.
>>
>>
>> DB Tsai | Siri Open Source Technologies [not a contribution] |  Apple, Inc
>>
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>
>
> --
> Marcelo
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Release Apache Spark 2.4.1 (RC2)

2019-02-20 Thread Felix Cheung
Could you hold for a bit - I have one more fix to get in



From: d_t...@apple.com on behalf of DB Tsai 
Sent: Wednesday, February 20, 2019 12:25 PM
To: Spark dev list
Cc: Cesar Delgado
Subject: Re: [VOTE] Release Apache Spark 2.4.1 (RC2)

Okay. Let's fail rc2, and I'll prepare rc3 with SPARK-26859.

DB Tsai | Siri Open Source Technologies [not a contribution] |  Apple, Inc

> On Feb 20, 2019, at 12:11 PM, Marcelo Vanzin  
> wrote:
>
> Just wanted to point out that
> https://issues.apache.org/jira/browse/SPARK-26859 is not in this RC,
> and is marked as a correctness bug. (The fix is in the 2.4 branch,
> just not in rc2.)
>
> On Wed, Feb 20, 2019 at 12:07 PM DB Tsai  wrote:
>>
>> Please vote on releasing the following candidate as Apache Spark version 
>> 2.4.1.
>>
>> The vote is open until Feb 24 PST and passes if a majority +1 PMC votes are 
>> cast, with
>> a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 2.4.1
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is v2.4.1-rc2 (commit 
>> 229ad524cfd3f74dd7aa5fc9ba841ae223caa960):
>> https://github.com/apache/spark/tree/v2.4.1-rc2
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v2.4.1-rc2-bin/
>>
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1299/
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v2.4.1-rc2-docs/
>>
>> The list of bug fixes going into 2.4.1 can be found at the following URL:
>> https://issues.apache.org/jira/projects/SPARK/versions/2.4.1
>>
>> FAQ
>>
>> =
>> How can I help test this release?
>> =
>>
>> If you are a Spark user, you can help us test this release by taking
>> an existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env and install
>> the current RC and see if anything important breaks, in the Java/Scala
>> you can add the staging repository to your projects resolvers and test
>> with the RC (make sure to clean up the artifact cache before/after so
>> you don't end up building with a out of date RC going forward).
>>
>> ===
>> What should happen to JIRA tickets still targeting 2.4.1?
>> ===
>>
>> The current list of open tickets targeted at 2.4.1 can be found at:
>> https://issues.apache.org/jira/projects/SPARK and search for "Target 
>> Version/s" = 2.4.1
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should
>> be worked on immediately. Everything else please retarget to an
>> appropriate release.
>>
>> ==
>> But my bug isn't fixed?
>> ==
>>
>> In order to make timely releases, we will typically not hold the
>> release unless the bug in question is a regression from the previous
>> release. That being said, if there is something which is a regression
>> that has not been correctly targeted please ping me or a committer to
>> help target the issue.
>>
>>
>> DB Tsai | Siri Open Source Technologies [not a contribution] |  Apple, Inc
>>
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>
>
> --
> Marcelo
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] SPIP: Identifiers for multi-catalog Spark

2019-02-19 Thread Felix Cheung
+1



From: Ryan Blue 
Sent: Tuesday, February 19, 2019 9:34 AM
To: Jamison Bennett
Cc: dev
Subject: Re: [VOTE] SPIP: Identifiers for multi-catalog Spark

+1

On Tue, Feb 19, 2019 at 8:41 AM Jamison Bennett 
 wrote:
+1 (non-binding)


Jamison Bennett

Cloudera Software Engineer

jamison.benn...@cloudera.com

515 Congress Ave, Suite 1212   |   Austin, TX   |   78701


On Tue, Feb 19, 2019 at 10:33 AM Maryann Xue 
mailto:maryann@databricks.com>> wrote:
+1

On Mon, Feb 18, 2019 at 10:46 PM John Zhuge 
mailto:jzh...@apache.org>> wrote:
+1

On Mon, Feb 18, 2019 at 8:43 PM Dongjoon Hyun 
mailto:dongjoon.h...@gmail.com>> wrote:
+1

Dongjoon.

On 2019/02/19 04:12:23, Wenchen Fan 
mailto:cloud0...@gmail.com>> wrote:
> +1
>
> On Tue, Feb 19, 2019 at 10:50 AM Ryan Blue 
> wrote:
>
> > Hi everyone,
> >
> > It looks like there is consensus on the proposal, so I'd like to start a
> > vote thread on the SPIP for identifiers in multi-catalog Spark.
> >
> > The doc is available here:
> > https://docs.google.com/document/d/1jEcvomPiTc5GtB9F7d2RTVVpMY64Qy7INCA_rFEd9HQ/edit?usp=sharing
> >
> > Please vote in the next 3 days.
> >
> > [ ] +1: Accept the proposal as an official SPIP
> > [ ] +0
> > [ ] -1: I don't think this is a good idea because ...
> >
> >
> > Thanks!
> >
> > rb
> >
> > --
> > Ryan Blue
> > Software Engineer
> > Netflix
> >
>

-
To unsubscribe e-mail: 
dev-unsubscr...@spark.apache.org



--
John Zhuge


--
Ryan Blue
Software Engineer
Netflix


Re: SparkR + binary type + how to get value

2019-02-19 Thread Felix Cheung
from the second image it looks like there is protocol mismatch. I’d check if 
the SparkR package running there on Livy machine matches the Spark java release.

But in any case this seems more an issue with Livy config. I’d suggest checking 
with the community there:




From: Thijs Haarhuis 
Sent: Tuesday, February 19, 2019 5:28 AM
To: Felix Cheung; user@spark.apache.org
Subject: Re: SparkR + binary type + how to get value

Hi Felix,

Thanks. I got it working now by using the unlist function.

I have another question, maybe you can help me with, since I did see your 
naming popping up regarding the spark.lapply function.
I am using Apache Livy and am having troubles using this function, I even 
reported a jira ticket for it at:
https://jira.apache.org/jira/browse/LIVY-558

When I call the spark.lapply function it reports that SparkR is not initialized.
I have looked into the spark.lapply function and it seems there is no spark 
context.
Any idea how I can debug this?

I hope you can help.

Regards,
Thijs


From: Felix Cheung 
Sent: Sunday, February 17, 2019 7:18 PM
To: Thijs Haarhuis; user@spark.apache.org
Subject: Re: SparkR + binary type + how to get value

A byte buffer in R is the raw vector type, so seems like it is working as 
expected. What do you have in the raw byte? You could convert into other types 
or access individual byte directly...

https://stat.ethz.ch/R-manual/R-devel/library/base/html/raw.html



From: Thijs Haarhuis 
Sent: Thursday, February 14, 2019 4:01 AM
To: Felix Cheung; user@spark.apache.org
Subject: Re: SparkR + binary type + how to get value

Hi Felix,
Sure..

I have the following code:

  printSchema(results)
  cat("\n\n\n")

  firstRow <- first(results)
  value <- firstRow$value

  cat(paste0("Value Type: '",typeof(value),"'\n\n\n"))
  cat(paste0("Value: '",value,"'\n\n\n"))

results is a Spark Data Frame here.

When I run this code the following is printed to console:

[cid:04497e3e-7983-488a-8516-5d2349778f03]

You can there is only a single column in this sdf of type binary
when I collect this value and print the type it prints it is a list.

Any idea how to get the actual value, or how to process the individual bytes?

Thanks
Thijs


From: Felix Cheung 
Sent: Thursday, February 14, 2019 5:31 AM
To: Thijs Haarhuis; user@spark.apache.org
Subject: Re: SparkR + binary type + how to get value

Please share your code



From: Thijs Haarhuis 
Sent: Wednesday, February 13, 2019 6:09 AM
To: user@spark.apache.org
Subject: SparkR + binary type + how to get value


Hi all,



Does anybody have any experience in accessing the data from a column which has 
a binary type in a Spark Data Frame in R?

I have a Spark Data Frame which has a column which is of a binary type. I want 
to access this data and process it.

In my case I collect the spark data frame to a R data frame and access the 
first row.

When I print this row to the console it does print all the hex values correctly.



However when I access the column it prints it is a list of 1 …when I print the 
type of the child element..it again prints it is a list.

I expected this value to be of a raw type.



Anybody has some experience with this?



Thanks

Thijs




Re: Missing SparkR in CRAN

2019-02-19 Thread Felix Cheung
We are waiting for update from CRAN. Please hold on.



From: Takeshi Yamamuro 
Sent: Tuesday, February 19, 2019 2:53 PM
To: dev
Subject: Re: Missing SparkR in CRAN

Hi, guys

It seems SparkR still not found in CRAN and any problem
when resubmitting it?


On Fri, Jan 25, 2019 at 1:41 AM Felix Cheung 
mailto:felixche...@apache.org>> wrote:
Yes it was discussed on dev@. We are waiting for 2.3.3 to release to resubmit.


On Thu, Jan 24, 2019 at 5:33 AM Hyukjin Kwon 
mailto:gurwls...@gmail.com>> wrote:
Hi all,

I happened to find SparkR is missing in CRAN. See 
https://cran.r-project.org/web/packages/SparkR/index.html

I remember I saw some threads about this in spark-dev mailing list a long long 
ago IIRC. Is it in progress to fix it somewhere? or is it something I 
misunderstood?


--
---
Takeshi Yamamuro


[jira] [Created] (SPARK-26918) All .md should have ASF license header

2019-02-18 Thread Felix Cheung (JIRA)
Felix Cheung created SPARK-26918:


 Summary: All .md should have ASF license header
 Key: SPARK-26918
 URL: https://issues.apache.org/jira/browse/SPARK-26918
 Project: Spark
  Issue Type: Bug
  Components: Documentation
Affects Versions: 2.4.0, 3.0.0
Reporter: Felix Cheung


per policy, all md files should have the header, like eg. 
[https://raw.githubusercontent.com/apache/arrow/master/docs/README.md]

currently it does not

[https://raw.githubusercontent.com/apache/spark/master/docs/sql-reference.md] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26918) All .md should have ASF license header

2019-02-18 Thread Felix Cheung (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-26918:
-
Description: 
per policy, all md files should have the header, like eg. 
[https://raw.githubusercontent.com/apache/arrow/master/docs/README.md]

 

or

 

[https://raw.githubusercontent.com/apache/hadoop/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/filesystem.md]

 

currently it does not

[https://raw.githubusercontent.com/apache/spark/master/docs/sql-reference.md] 

  was:
per policy, all md files should have the header, like eg. 
[https://raw.githubusercontent.com/apache/arrow/master/docs/README.md]

currently it does not

[https://raw.githubusercontent.com/apache/spark/master/docs/sql-reference.md] 


> All .md should have ASF license header
> --
>
> Key: SPARK-26918
> URL: https://issues.apache.org/jira/browse/SPARK-26918
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Felix Cheung
>Priority: Major
>
> per policy, all md files should have the header, like eg. 
> [https://raw.githubusercontent.com/apache/arrow/master/docs/README.md]
>  
> or
>  
> [https://raw.githubusercontent.com/apache/hadoop/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/filesystem.md]
>  
> currently it does not
> [https://raw.githubusercontent.com/apache/spark/master/docs/sql-reference.md] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



<    1   2   3   4   5   6   7   8   9   10   >