from:"Felix Cheung \(JIRA\)"

[jira] [Commented] (SPARK-20684) expose createOrReplaceGlobalTempView/createGlobalTempView and dropGlobalTempView in SparkR

2020-07-28 Thread Felix Cheung (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-20684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17166801#comment-17166801
 ] 

Felix Cheung commented on SPARK-20684:
--

[https://github.com/apache/spark/pull/17941#issuecomment-301669567]

[https://github.com/apache/spark/pull/19176#issuecomment-328292002]

[https://github.com/apache/spark/pull/19176#issuecomment-328292789]

 

> expose createOrReplaceGlobalTempView/createGlobalTempView and 
> dropGlobalTempView in SparkR
> --
>
> Key: SPARK-20684
> URL: https://issues.apache.org/jira/browse/SPARK-20684
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.2.0
>Reporter: Hossein Falaki
>Priority: Major
>
> This is a useful API that is not exposed in SparkR. It will help with moving 
> data between languages on a single single Spark application.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-12172) Consider removing SparkR internal RDD APIs

2020-07-28 Thread Felix Cheung (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-12172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17166581#comment-17166581
 ] 

Felix Cheung edited comment on SPARK-12172 at 7/29/20, 1:20 AM:


These are methods (map etc) that were never public and not supported.

They were not call-able unless you directly reference the internal namespace 
Spark:::


was (Author: felixcheung):
These are methods (map etc) that were never public and not supported.

> Consider removing SparkR internal RDD APIs
> --
>
> Key: SPARK-12172
> URL: https://issues.apache.org/jira/browse/SPARK-12172
> Project: Spark
>  Issue Type: Task
>  Components: SparkR
>Reporter: Felix Cheung
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-12172) Consider removing SparkR internal RDD APIs

2020-07-28 Thread Felix Cheung (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-12172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17166581#comment-17166581
 ] 

Felix Cheung edited comment on SPARK-12172 at 7/29/20, 1:19 AM:


These are methods (map etc) that were never public and not supported.


was (Author: felixcheung):
These are methods (map etc) that were never public and not supported.

On Tue, Jul 28, 2020 at 10:18 AM S Daniel Zafar (Jira) 



> Consider removing SparkR internal RDD APIs
> --
>
> Key: SPARK-12172
> URL: https://issues.apache.org/jira/browse/SPARK-12172
> Project: Spark
>  Issue Type: Task
>  Components: SparkR
>Reporter: Felix Cheung
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12172) Consider removing SparkR internal RDD APIs

2020-07-28 Thread Felix Cheung (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-12172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17166581#comment-17166581
 ] 

Felix Cheung commented on SPARK-12172:
--

These are methods (map etc) that were never public and not supported.

On Tue, Jul 28, 2020 at 10:18 AM S Daniel Zafar (Jira) 



> Consider removing SparkR internal RDD APIs
> --
>
> Key: SPARK-12172
> URL: https://issues.apache.org/jira/browse/SPARK-12172
> Project: Spark
>  Issue Type: Task
>  Components: SparkR
>Reporter: Felix Cheung
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29042) Sampling-based RDD with unordered input should be INDETERMINATE

2019-09-12 Thread Felix Cheung (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-29042:
-
Labels: correctness  (was: )

> Sampling-based RDD with unordered input should be INDETERMINATE
> ---
>
> Key: SPARK-29042
> URL: https://issues.apache.org/jira/browse/SPARK-29042
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Liang-Chi Hsieh
>Priority: Major
>  Labels: correctness
>
> We have found and fixed the correctness issue when RDD output is 
> INDETERMINATE. One missing part is sampling-based RDD. This kind of RDDs is 
> order sensitive to its input. A sampling-based RDD with unordered input, 
> should be INDETERMINATE.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27495) SPIP: Support Stage level resource configuration and scheduling

2019-09-01 Thread Felix Cheung (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-27495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920479#comment-16920479
 ] 

Felix Cheung commented on SPARK-27495:
--

+1 on this.

 

I've reviewed this. A few questions/comments:
 # in the description above there is a passage on "Spark internal use by 
catalyst" - looking at the rest of the material, google doc etc, is this out of 
scope? if so we should clarify.
 # "different resources in multiple RDDs that get combined into a single stage" 
- this merge can be complicated, and I'm not sure taking the max etc is going 
to be right at all time. At the least it will be very confusing to the user on 
how much resource is used etc. Instead of a heuristic, the max etc, how about 
in the event of mismatch involving multiple RDDs, we detect and fail (fail 
fast) and ask the user to do a "repartition" operation before that stage?
 # in later comment, "resource requirement as a hint" - I am actually unsure 
about that. in many ML or DL/tensorflow use cases where MPI or allreduce are 
involved, the strict number of GPU, process, machine are required or else they 
fail to start. I am in favor of a strict mode for that purpose.

> SPIP: Support Stage level resource configuration and scheduling
> ---
>
> Key: SPARK-27495
> URL: https://issues.apache.org/jira/browse/SPARK-27495
> Project: Spark
>  Issue Type: Epic
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Major
>
> *Q1.* What are you trying to do? Articulate your objectives using absolutely 
> no jargon.
> Objectives:
>  # Allow users to specify task and executor resource requirements at the 
> stage level. 
>  # Spark will use the stage level requirements to acquire the necessary 
> resources/executors and schedule tasks based on the per stage requirements.
> Many times users have different resource requirements for different stages of 
> their application so they want to be able to configure resources at the stage 
> level. For instance, you have a single job that has 2 stages. The first stage 
> does some  ETL which requires a lot of tasks, each with a small amount of 
> memory and 1 core each. Then you have a second stage where you feed that ETL 
> data into an ML algorithm. The second stage only requires a few executors but 
> each executor needs a lot of memory, GPUs, and many cores.  This feature 
> allows the user to specify the task and executor resource requirements for 
> the ETL Stage and then change them for the ML stage of the job.  
> Resources include cpu, memory (on heap, overhead, pyspark, and off heap), and 
> extra Resources (GPU/FPGA/etc). It has the potential to allow for other 
> things like limiting the number of tasks per stage, specifying other 
> parameters for things like shuffle, etc. Initially I would propose we only 
> support resources as they are now. So Task resources would be cpu and other 
> resources (GPU, FPGA), that way we aren't adding in extra scheduling things 
> at this point.  Executor resources would be cpu, memory, and extra 
> resources(GPU,FPGA, etc). Changing the executor resources will rely on 
> dynamic allocation being enabled.
> Main use cases:
>  # ML use case where user does ETL and feeds it into an ML algorithm where 
> it’s using the RDD API. This should work with barrier scheduling as well once 
> it supports dynamic allocation.
>  # Spark internal use by catalyst. Catalyst could control the stage level 
> resources as it finds the need to change it between stages for different 
> optimizations. For instance, with the new columnar plugin to the query 
> planner we can insert stages into the plan that would change running 
> something on the CPU in row format to running it on the GPU in columnar 
> format. This API would allow the planner to make sure the stages that run on 
> the GPU get the corresponding GPU resources it needs to run. Another possible 
> use case for catalyst is that it would allow catalyst to add in more 
> optimizations to where the user doesn’t need to configure container sizes at 
> all. If the optimizer/planner can handle that for the user, everyone wins.
> This SPIP focuses on the RDD API but we don’t exclude the Dataset API. I 
> think the DataSet API will require more changes because it specifically hides 
> the RDD from the users via the plans and catalyst can optimize the plan and 
> insert things into the plan. The only way I’ve found to make this work with 
> the Dataset API would be modifying all the plans to be able to get the 
> resource requirements down into where it creates the RDDs, which I believe 
> would be a lot of change.  If other people know better options, it would be 
> great to hear them.
> *Q2.* What problem is this

[jira] [Commented] (SPARK-28594) Allow event logs for running streaming apps to be rolled over.

2019-09-01 Thread Felix Cheung (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-28594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920476#comment-16920476
 ] 

Felix Cheung commented on SPARK-28594:
--

Reviewed. looks reasonable to me. I can help shepherd this work.

 

ping [~srowen] [~vanzin] [~irashid] for feedback.

> Allow event logs for running streaming apps to be rolled over.
> --
>
> Key: SPARK-28594
> URL: https://issues.apache.org/jira/browse/SPARK-28594
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
> Environment: This has been reported on 2.0.2.22 but affects all 
> currently available versions.
>Reporter: Stephen Levett
>Priority: Major
>
> At all current Spark releases when event logging on spark streaming is 
> enabled the event logs grow massively.  The files continue to grow until the 
> application is stopped or killed.
> The Spark history server then has difficulty processing the files.
> https://issues.apache.org/jira/browse/SPARK-8617
> Addresses .inprogress files but not event log files that are still running.
> Identify a mechanism to set a "max file" size so that the file is rolled over 
> when it reaches this size?
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-28594) Allow event logs for running streaming apps to be rolled over.

2019-09-01 Thread Felix Cheung (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-28594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-28594:
-
Shepherd: Felix Cheung

> Allow event logs for running streaming apps to be rolled over.
> --
>
> Key: SPARK-28594
> URL: https://issues.apache.org/jira/browse/SPARK-28594
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
> Environment: This has been reported on 2.0.2.22 but affects all 
> currently available versions.
>Reporter: Stephen Levett
>Priority: Major
>
> At all current Spark releases when event logging on spark streaming is 
> enabled the event logs grow massively.  The files continue to grow until the 
> application is stopped or killed.
> The Spark history server then has difficulty processing the files.
> https://issues.apache.org/jira/browse/SPARK-8617
> Addresses .inprogress files but not event log files that are still running.
> Identify a mechanism to set a "max file" size so that the file is rolled over 
> when it reaches this size?
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21367) R older version of Roxygen2 on Jenkins

2019-05-12 Thread Felix Cheung (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-21367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838188#comment-16838188
 ] 

Felix Cheung commented on SPARK-21367:
--

great thx!

> R older version of Roxygen2 on Jenkins
> --
>
> Key: SPARK-21367
> URL: https://issues.apache.org/jira/browse/SPARK-21367
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.3.0
>Reporter: Felix Cheung
>Assignee: shane knapp
>Priority: Major
> Attachments: R.paks
>
>
> Getting this message from a recent build.
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79461/console
> Warning messages:
> 1: In check_dep_version(pkg, version, compare) :
>   Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
> 2: In check_dep_version(pkg, version, compare) :
>   Need roxygen2 >= 5.0.0 but loaded version is 4.1.1
> * installing *source* package 'SparkR' ...
> ** R
> We have been running with 5.0.1 and haven't changed for a year.
> NOTE: Roxygen 6.x has some big changes and IMO we should not move to that yet.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27684) Reduce ScalaUDF conversion overheads for primitives

2019-05-12 Thread Felix Cheung (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838187#comment-16838187
 ] 

Felix Cheung commented on SPARK-27684:
--

definitely could be interesting..

> Reduce ScalaUDF conversion overheads for primitives
> ---
>
> Key: SPARK-27684
> URL: https://issues.apache.org/jira/browse/SPARK-27684
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Josh Rosen
>Priority: Major
>
> I believe that we can reduce ScalaUDF overheads when operating over primitive 
> types.
> In [ScalaUDF's 
> doGenCode|https://github.com/apache/spark/blob/5a8aad01c2aaf0ceef8e9a3cfabbd2e88c8d9f0d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala#L991]
>  we have logic to convert UDF function input types from Catalyst internal 
> types to Scala types (for example, this is used to convert UTF8Strings to 
> Java Strings). Similarly, we convert UDF return types.
> However, UDF input argument conversion is effectively a no-op for primitive 
> types because {{CatalystTypeConverters.createToScalaConverter()}} returns 
> {{identity}} in those cases. UDF result conversion is a little tricker 
> because {{createToCatalystConverter()}} returns [a 
> function|https://github.com/apache/spark/blob/5a8aad01c2aaf0ceef8e9a3cfabbd2e88c8d9f0d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala#L413]
>  that handles {{Option[Primitive]}}, but it might be the case that the 
> Option-boxing is unusable via ScalaUDF (in which case the conversion truly is 
> an {{identity}} no-op).
> These unnecessary no-op conversions could be quite expensive because each 
> call involves an index into the {{references}} array to get the converters, a 
> second index into the converters array to get the correct converter for the 
> nth input argument, and, finally, the converter invocation itself:
> {code:java}
> Object project_arg_0 = false ? null : ((scala.Function1[]) references[1] /* 
> converters */)[0].apply(project_value_3);{code}
> In these cases, I believe that we can reduce lookup / invocation overheads by 
> modifying the ScalaUDF code generation to eliminate the conversion calls for 
> primitives and directly assign the unconverted result, e.g.
> {code:java}
> Object project_arg_0 = false ? null : project_value_3;{code}
> To cleanly handle the case where we have a multi-argument UDF accepting a 
> mixture of primitive and non-primitive types, we might be able to keep the 
> {{converters}} array the same size (so indexes stay the same) but omit the 
> invocation of the converters for the primitive arguments (e.g. {{converters}} 
> is sparse / contains unused entries in case of primitives).
> I spotted this optimization while trying to construct some quick benchmarks 
> to measure UDF invocation overheads. For example:
> {code:java}
> spark.udf.register("identity", (x: Int) => x)
> sql("select id, id * 2, id * 3 from range(1000 * 1000 * 1000)").rdd.count() 
> // ~ 52 seconds
> sql("select identity(id), identity(id * 2), identity(id * 3) from range(1000 
> * 1000 * 1000)").rdd.count() // ~84 seconds{code}
> I'm curious to see whether the optimization suggested here can close this 
> performance gap. It'd also be a good idea to construct more principled 
> microbenchmarks covering multi-argument UDFs, projections involving multiple 
> UDFs over different input and output types, etc.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-21805) disable R vignettes code on Windows

2019-04-06 Thread Felix Cheung (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-21805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-21805:
-
Issue Type: Sub-task  (was: Bug)
Parent: SPARK-15799

> disable R vignettes code on Windows
> ---
>
> Key: SPARK-21805
> URL: https://issues.apache.org/jira/browse/SPARK-21805
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 2.2.0, 2.3.0
>Reporter: Felix Cheung
>Assignee: Felix Cheung
>Priority: Major
> Fix For: 2.2.1, 2.3.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-22344) Prevent R CMD check from using /tmp

2019-04-06 Thread Felix Cheung (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-22344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-22344:
-
Issue Type: Sub-task  (was: Bug)
Parent: SPARK-15799

> Prevent R CMD check from using /tmp
> ---
>
> Key: SPARK-22344
> URL: https://issues.apache.org/jira/browse/SPARK-22344
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 1.6.3, 2.1.2, 2.2.0, 2.3.0
>Reporter: Shivaram Venkataraman
>Assignee: Shivaram Venkataraman
>Priority: Major
> Fix For: 2.2.1, 2.3.0
>
>
> When R CMD check is run on the SparkR package it leaves behind files in /tmp 
> which is a violation of CRAN policy. We should instead write to Rtmpdir. 
> Notes from CRAN are below
> {code}
> Checking this leaves behind dirs
>hive/$USER
>$USER
> and files named like
>b4f6459b-0624-4100-8358-7aa7afbda757_resources
> in /tmp, in violation of the CRAN Policy.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-24535) Fix java version parsing in SparkR on Windows

2019-04-06 Thread Felix Cheung (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-24535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-24535:
-
Issue Type: Sub-task  (was: Bug)
Parent: SPARK-15799

> Fix java version parsing in SparkR on Windows
> -
>
> Key: SPARK-24535
> URL: https://issues.apache.org/jira/browse/SPARK-24535
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 2.3.1, 2.4.0
>Reporter: Shivaram Venkataraman
>Assignee: Felix Cheung
>Priority: Blocker
> Fix For: 2.3.2, 2.4.0
>
>
> We see errors on CRAN of the form 
> {code:java}
>   java version "1.8.0_144"
>   Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
>   Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)
>   Picked up _JAVA_OPTIONS: -XX:-UsePerfData 
>   -- 1. Error: create DataFrame from list or data.frame (@test_basic.R#21)  
> --
>   subscript out of bounds
>   1: sparkR.session(master = sparkRTestMaster, enableHiveSupport = FALSE, 
> sparkConfig = sparkRTestConfig) at 
> D:/temp/RtmpIJ8Cc3/RLIBS_3242c713c3181/SparkR/tests/testthat/test_basic.R:21
>   2: sparkR.sparkContext(master, appName, sparkHome, sparkConfigMap, 
> sparkExecutorEnvMap, 
>  sparkJars, sparkPackages)
>   3: checkJavaVersion()
>   4: strsplit(javaVersionFilter[[1]], "[\"]")
> {code}
> The complete log file is at 
> http://home.apache.org/~shivaram/SparkR_2.3.1_check_results/Windows/00check.log



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-25572) SparkR tests failed on CRAN on Java 10

2019-04-06 Thread Felix Cheung (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-25572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-25572:
-
Issue Type: Sub-task  (was: Bug)
Parent: SPARK-15799

> SparkR tests failed on CRAN on Java 10
> --
>
> Key: SPARK-25572
> URL: https://issues.apache.org/jira/browse/SPARK-25572
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 2.4.0
>Reporter: Felix Cheung
>Assignee: Felix Cheung
>Priority: Major
> Fix For: 2.3.3, 2.4.0
>
>
> follow up to SPARK-24255
> from 2.3.2 release we can see that CRAN doesn't seem to respect the system 
> requirements as running tests - we have seen cases where SparkR is run on 
> Java 10, which unfortunately Spark does not start on. For 2.4.x, lets attempt 
> skipping all tests



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-26010) SparkR vignette fails on CRAN on Java 11

2019-04-06 Thread Felix Cheung (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-26010:
-
Issue Type: Sub-task  (was: Bug)
Parent: SPARK-15799

> SparkR vignette fails on CRAN on Java 11
> 
>
> Key: SPARK-26010
> URL: https://issues.apache.org/jira/browse/SPARK-26010
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Felix Cheung
>Assignee: Felix Cheung
>Priority: Major
> Fix For: 2.3.3, 2.4.1, 3.0.0
>
>
> follow up to SPARK-25572
> but for vignettes
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-15799) Release SparkR on CRAN

2019-04-06 Thread Felix Cheung (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-15799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16811654#comment-16811654
 ] 

Felix Cheung commented on SPARK-15799:
--

more fixed for this (did not open JIRA)

[https://github.com/apache/spark/commit/fa0f791d4d9f083a45ab631a2e9f88a6b749e416#diff-e1e1d3d40573127e9ee0480caf1283d6]

[https://github.com/apache/spark/commit/927081dd959217ed6bf014557db20026d7e22672#diff-e1e1d3d40573127e9ee0480caf1283d6]

 

> Release SparkR on CRAN
> --
>
> Key: SPARK-15799
> URL: https://issues.apache.org/jira/browse/SPARK-15799
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Reporter: Xiangrui Meng
>Assignee: Shivaram Venkataraman
>Priority: Major
> Fix For: 2.1.2
>
>
> Story: "As an R user, I would like to see SparkR released on CRAN, so I can 
> use SparkR easily in an existing R environment and have other packages built 
> on top of SparkR."
> I made this JIRA with the following questions in mind:
> * Are there known issues that prevent us releasing SparkR on CRAN?
> * Do we want to package Spark jars in the SparkR release?
> * Are there license issues?
> * How does it fit into Spark's release process?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-26910) Re-release SparkR to CRAN

2019-04-06 Thread Felix Cheung (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung resolved SPARK-26910.
--
   Resolution: Fixed
Fix Version/s: 2.4.1

2.4.1. released [https://cran.r-project.org/web/packages/SparkR/index.html]

> Re-release SparkR to CRAN
> -
>
> Key: SPARK-26910
> URL: https://issues.apache.org/jira/browse/SPARK-26910
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Affects Versions: 2.4.0
>Reporter: Michael Chirico
>Assignee: Felix Cheung
>Priority: Major
> Fix For: 2.4.1
>
>
> The logical successor to https://issues.apache.org/jira/browse/SPARK-15799
> I don't see anything specifically tracking re-release in the Jira list. It 
> would be helpful to have an issue tracking this to refer to as an outsider, 
> as well as to document what the blockers are in case some outside help could 
> be useful.
>  * Is there a plan to re-release SparkR to CRAN?
>  * What are the major blockers to doing so at the moment?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27389) pyspark test failures w/ "UnknownTimeZoneError: 'US/Pacific-New'"

2019-04-05 Thread Felix Cheung (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-27389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16811461#comment-16811461
 ] 

Felix Cheung commented on SPARK-27389:
--

maybe a new JDK changes TimeZone?

> pyspark test failures w/ "UnknownTimeZoneError: 'US/Pacific-New'"
> -
>
> Key: SPARK-27389
> URL: https://issues.apache.org/jira/browse/SPARK-27389
> Project: Spark
>  Issue Type: Task
>  Components: jenkins, PySpark
>Affects Versions: 3.0.0
>Reporter: Imran Rashid
>Assignee: shane knapp
>Priority: Major
>
> I've seen a few odd PR build failures w/ an error in pyspark tests about 
> "UnknownTimeZoneError: 'US/Pacific-New'".  eg. 
> https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4688/consoleFull
> A bit of searching tells me that US/Pacific-New probably isn't really 
> supposed to be a timezone at all: 
> https://mm.icann.org/pipermail/tz/2009-February/015448.html
> I'm guessing that this is from some misconfiguration of jenkins.  that said, 
> I can't figure out what is wrong.  There does seem to be a timezone entry for 
> US/Pacific-New in {{/usr/share/zoneinfo/US/Pacific-New}} -- but it seems to 
> be there on every amp-jenkins-worker, so I dunno what that alone would cause 
> this failure sometime.
> [~shaneknapp] I am tentatively calling this a "jenkins" issue, but I might be 
> totally wrong here and it is really a pyspark problem.
> Full Stack trace from the test failure:
> {noformat}
> ==
> ERROR: test_to_pandas (pyspark.sql.tests.test_dataframe.DataFrameTests)
> --
> Traceback (most recent call last):
>   File 
> "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/tests/test_dataframe.py",
>  line 522, in test_to_pandas
> pdf = self._to_pandas()
>   File 
> "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/tests/test_dataframe.py",
>  line 517, in _to_pandas
> return df.toPandas()
>   File 
> "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/dataframe.py",
>  line 2189, in toPandas
> _check_series_convert_timestamps_local_tz(pdf[field.name], timezone)
>   File 
> "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py",
>  line 1891, in _check_series_convert_timestamps_local_tz
> return _check_series_convert_timestamps_localize(s, None, timezone)
>   File 
> "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py",
>  line 1877, in _check_series_convert_timestamps_localize
> lambda ts: ts.tz_localize(from_tz, 
> ambiguous=False).tz_convert(to_tz).tz_localize(None)
>   File "/home/anaconda/lib/python2.7/site-packages/pandas/core/series.py", 
> line 2294, in apply
> mapped = lib.map_infer(values, f, convert=convert_dtype)
>   File "pandas/src/inference.pyx", line 1207, in pandas.lib.map_infer 
> (pandas/lib.c:66124)
>   File 
> "/home/jenkins/workspace/NewSparkPullRequestBuilder@2/python/pyspark/sql/types.py",
>  line 1878, in 
> if ts is not pd.NaT else pd.NaT)
>   File "pandas/tslib.pyx", line 649, in pandas.tslib.Timestamp.tz_convert 
> (pandas/tslib.c:13923)
>   File "pandas/tslib.pyx", line 407, in pandas.tslib.Timestamp.__new__ 
> (pandas/tslib.c:10447)
>   File "pandas/tslib.pyx", line 1467, in pandas.tslib.convert_to_tsobject 
> (pandas/tslib.c:27504)
>   File "pandas/tslib.pyx", line 1768, in pandas.tslib.maybe_get_tz 
> (pandas/tslib.c:32362)
>   File "/home/anaconda/lib/python2.7/site-packages/pytz/__init__.py", line 
> 178, in timezone
> raise UnknownTimeZoneError(zone)
> UnknownTimeZoneError: 'US/Pacific-New'
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26910) Re-release SparkR to CRAN

2019-03-13 Thread Felix Cheung (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16792120#comment-16792120
 ] 

Felix Cheung commented on SPARK-26910:
--

2.3.3 failed. we are waiting for 2.4.1 to be released

> Re-release SparkR to CRAN
> -
>
> Key: SPARK-26910
> URL: https://issues.apache.org/jira/browse/SPARK-26910
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Affects Versions: 2.4.0
>Reporter: Michael Chirico
>Assignee: Felix Cheung
>Priority: Major
>
> The logical successor to https://issues.apache.org/jira/browse/SPARK-15799
> I don't see anything specifically tracking re-release in the Jira list. It 
> would be helpful to have an issue tracking this to refer to as an outsider, 
> as well as to document what the blockers are in case some outside help could 
> be useful.
>  * Is there a plan to re-release SparkR to CRAN?
>  * What are the major blockers to doing so at the moment?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26604) Register channel for stream request

2019-03-06 Thread Felix Cheung (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786397#comment-16786397
 ] 

Felix Cheung commented on SPARK-26604:
--

could we backport this to branch-2.4?

> Register channel for stream request
> ---
>
> Key: SPARK-26604
> URL: https://issues.apache.org/jira/browse/SPARK-26604
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Liang-Chi Hsieh
>Assignee: Liang-Chi Hsieh
>Priority: Minor
> Fix For: 3.0.0
>
>
> Now in {{TransportRequestHandler.processStreamRequest}}, when a stream 
> request is processed, the stream id is not registered with the current 
> channel in stream manager. It should do that so in case of that the channel 
> gets terminated we can remove associated streams from stream requests too.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-26918) All .md should have ASF license header

2019-03-04 Thread Felix Cheung (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784109#comment-16784109
 ] 

Felix Cheung edited comment on SPARK-26918 at 3/5/19 5:47 AM:
--

[~rmsm...@gmail.com] - you don't need to checkout a tag (or a release) - just 
checkout master into a local branch to test 


was (Author: felixcheung):
[~rmsm...@gmail.com] - you don't need to checkout a tag - just checkout master 
into a local branch to test 

> All .md should have ASF license header
> --
>
> Key: SPARK-26918
> URL: https://issues.apache.org/jira/browse/SPARK-26918
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Felix Cheung
>Priority: Minor
>
> per policy, all md files should have the header, like eg. 
> [https://raw.githubusercontent.com/apache/arrow/master/docs/README.md]
>  or
> [https://raw.githubusercontent.com/apache/hadoop/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/filesystem.md]
>  
> currently it does not
> [https://raw.githubusercontent.com/apache/spark/master/docs/sql-reference.md] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26918) All .md should have ASF license header

2019-03-04 Thread Felix Cheung (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16784109#comment-16784109
 ] 

Felix Cheung commented on SPARK-26918:
--

[~rmsm...@gmail.com] - you don't need to checkout a tag - just checkout master 
into a local branch to test 

> All .md should have ASF license header
> --
>
> Key: SPARK-26918
> URL: https://issues.apache.org/jira/browse/SPARK-26918
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Felix Cheung
>Priority: Minor
>
> per policy, all md files should have the header, like eg. 
> [https://raw.githubusercontent.com/apache/arrow/master/docs/README.md]
>  or
> [https://raw.githubusercontent.com/apache/hadoop/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/filesystem.md]
>  
> currently it does not
> [https://raw.githubusercontent.com/apache/spark/master/docs/sql-reference.md] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26918) All .md should have ASF license header

2019-03-02 Thread Felix Cheung (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16782565#comment-16782565
 ] 

Felix Cheung commented on SPARK-26918:
--

[~srowen] what do you think?

> All .md should have ASF license header
> --
>
> Key: SPARK-26918
> URL: https://issues.apache.org/jira/browse/SPARK-26918
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Felix Cheung
>Priority: Minor
>
> per policy, all md files should have the header, like eg. 
> [https://raw.githubusercontent.com/apache/arrow/master/docs/README.md]
>  or
> [https://raw.githubusercontent.com/apache/hadoop/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/filesystem.md]
>  
> currently it does not
> [https://raw.githubusercontent.com/apache/spark/master/docs/sql-reference.md] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26918) All .md should have ASF license header

2019-03-02 Thread Felix Cheung (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16782564#comment-16782564
 ] 

Felix Cheung commented on SPARK-26918:
--

I'm for doing this (Reopen this issue)

Also [~rmsm...@gmail.com] this needs to be 
 # on all .md file
 # remove rat filter for .md then after that
 # run doc build to check the doc is generated properly

ie. at the beginning of the section "Update the Spark Website" 
https://spark.apache.org/release-process.html

{{$ cd docs }}

{{$ PRODUCTION=1 jekyll build }}

> All .md should have ASF license header
> --
>
> Key: SPARK-26918
> URL: https://issues.apache.org/jira/browse/SPARK-26918
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Felix Cheung
>Priority: Minor
>
> per policy, all md files should have the header, like eg. 
> [https://raw.githubusercontent.com/apache/arrow/master/docs/README.md]
>  or
> [https://raw.githubusercontent.com/apache/hadoop/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/filesystem.md]
>  
> currently it does not
> [https://raw.githubusercontent.com/apache/spark/master/docs/sql-reference.md] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-26918) All .md should have ASF license header

2019-02-23 Thread Felix Cheung (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16776080#comment-16776080
 ] 

Felix Cheung edited comment on SPARK-26918 at 2/24/19 12:36 AM:


actually, it does - it's a comment in ASF incubator that spark is setting the 
wrong example for all .md files. it's likely better to fix this before more 
direct feedback coming down.

[https://www.apache.org/legal/src-headers.html#headers]

[https://www.apache.org/legal/src-headers.html#faq-docs]


was (Author: felixcheung):
actually, it does - it's a comment in ASF incubator that spark is setting the 
wrong example for all .md files

https://www.apache.org/legal/src-headers.html#headers

https://www.apache.org/legal/src-headers.html#faq-docs

> All .md should have ASF license header
> --
>
> Key: SPARK-26918
> URL: https://issues.apache.org/jira/browse/SPARK-26918
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Felix Cheung
>Priority: Minor
>
> per policy, all md files should have the header, like eg. 
> [https://raw.githubusercontent.com/apache/arrow/master/docs/README.md]
>  or
> [https://raw.githubusercontent.com/apache/hadoop/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/filesystem.md]
>  
> currently it does not
> [https://raw.githubusercontent.com/apache/spark/master/docs/sql-reference.md] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26918) All .md should have ASF license header

2019-02-23 Thread Felix Cheung (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16776080#comment-16776080
 ] 

Felix Cheung commented on SPARK-26918:
--

actually, it does - it's a comment in ASF incubator that spark is setting the 
wrong example for all .md files

https://www.apache.org/legal/src-headers.html#headers

https://www.apache.org/legal/src-headers.html#faq-docs

> All .md should have ASF license header
> --
>
> Key: SPARK-26918
> URL: https://issues.apache.org/jira/browse/SPARK-26918
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Felix Cheung
>Priority: Minor
>
> per policy, all md files should have the header, like eg. 
> [https://raw.githubusercontent.com/apache/arrow/master/docs/README.md]
>  or
> [https://raw.githubusercontent.com/apache/hadoop/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/filesystem.md]
>  
> currently it does not
> [https://raw.githubusercontent.com/apache/spark/master/docs/sql-reference.md] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-26918) All .md should have ASF license header

2019-02-18 Thread Felix Cheung (JIRA)

Felix Cheung created SPARK-26918:


 Summary: All .md should have ASF license header
 Key: SPARK-26918
 URL: https://issues.apache.org/jira/browse/SPARK-26918
 Project: Spark
  Issue Type: Bug
  Components: Documentation
Affects Versions: 2.4.0, 3.0.0
Reporter: Felix Cheung


per policy, all md files should have the header, like eg. 
[https://raw.githubusercontent.com/apache/arrow/master/docs/README.md]

currently it does not

[https://raw.githubusercontent.com/apache/spark/master/docs/sql-reference.md] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-26918) All .md should have ASF license header

2019-02-18 Thread Felix Cheung (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-26918:
-
Description: 
per policy, all md files should have the header, like eg. 
[https://raw.githubusercontent.com/apache/arrow/master/docs/README.md]

 

or

 

[https://raw.githubusercontent.com/apache/hadoop/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/filesystem.md]

 

currently it does not

[https://raw.githubusercontent.com/apache/spark/master/docs/sql-reference.md] 

  was:
per policy, all md files should have the header, like eg. 
[https://raw.githubusercontent.com/apache/arrow/master/docs/README.md]

currently it does not

[https://raw.githubusercontent.com/apache/spark/master/docs/sql-reference.md] 


> All .md should have ASF license header
> --
>
> Key: SPARK-26918
> URL: https://issues.apache.org/jira/browse/SPARK-26918
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Felix Cheung
>Priority: Major
>
> per policy, all md files should have the header, like eg. 
> [https://raw.githubusercontent.com/apache/arrow/master/docs/README.md]
>  
> or
>  
> [https://raw.githubusercontent.com/apache/hadoop/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/filesystem.md]
>  
> currently it does not
> [https://raw.githubusercontent.com/apache/spark/master/docs/sql-reference.md] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-26918) All .md should have ASF license header

2019-02-18 Thread Felix Cheung (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-26918:
-
Description: 
per policy, all md files should have the header, like eg. 
[https://raw.githubusercontent.com/apache/arrow/master/docs/README.md]

 or

[https://raw.githubusercontent.com/apache/hadoop/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/filesystem.md]

 

currently it does not

[https://raw.githubusercontent.com/apache/spark/master/docs/sql-reference.md] 

  was:
per policy, all md files should have the header, like eg. 
[https://raw.githubusercontent.com/apache/arrow/master/docs/README.md]

 

or

 

[https://raw.githubusercontent.com/apache/hadoop/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/filesystem.md]

 

currently it does not

[https://raw.githubusercontent.com/apache/spark/master/docs/sql-reference.md] 


> All .md should have ASF license header
> --
>
> Key: SPARK-26918
> URL: https://issues.apache.org/jira/browse/SPARK-26918
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Felix Cheung
>Priority: Major
>
> per policy, all md files should have the header, like eg. 
> [https://raw.githubusercontent.com/apache/arrow/master/docs/README.md]
>  or
> [https://raw.githubusercontent.com/apache/hadoop/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/filesystem.md]
>  
> currently it does not
> [https://raw.githubusercontent.com/apache/spark/master/docs/sql-reference.md] 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26910) Re-release SparkR to CRAN

2019-02-18 Thread Felix Cheung (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16771496#comment-16771496
 ] 

Felix Cheung commented on SPARK-26910:
--

2.3.3 has been submitted to CRAN. we are currently waiting for test result.

> Re-release SparkR to CRAN
> -
>
> Key: SPARK-26910
> URL: https://issues.apache.org/jira/browse/SPARK-26910
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Affects Versions: 2.4.0
>Reporter: Michael Chirico
>Priority: Major
>
> The logical successor to https://issues.apache.org/jira/browse/SPARK-15799
> I don't see anything specifically tracking re-release in the Jira list. It 
> would be helpful to have an issue tracking this to refer to as an outsider, 
> as well as to document what the blockers are in case some outside help could 
> be useful.
>  * Is there a plan to re-release SparkR to CRAN?
>  * What are the major blockers to doing so at the moment?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26910) Re-release SparkR to CRAN

2019-02-18 Thread Felix Cheung (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16771498#comment-16771498
 ] 

Felix Cheung commented on SPARK-26910:
--

once that works we should look into 2.4.1

> Re-release SparkR to CRAN
> -
>
> Key: SPARK-26910
> URL: https://issues.apache.org/jira/browse/SPARK-26910
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Affects Versions: 2.4.0
>Reporter: Michael Chirico
>Assignee: Felix Cheung
>Priority: Major
>
> The logical successor to https://issues.apache.org/jira/browse/SPARK-15799
> I don't see anything specifically tracking re-release in the Jira list. It 
> would be helpful to have an issue tracking this to refer to as an outsider, 
> as well as to document what the blockers are in case some outside help could 
> be useful.
>  * Is there a plan to re-release SparkR to CRAN?
>  * What are the major blockers to doing so at the moment?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-26910) Re-release SparkR to CRAN

2019-02-18 Thread Felix Cheung (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung reassigned SPARK-26910:


Assignee: Felix Cheung

> Re-release SparkR to CRAN
> -
>
> Key: SPARK-26910
> URL: https://issues.apache.org/jira/browse/SPARK-26910
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Affects Versions: 2.4.0
>Reporter: Michael Chirico
>Assignee: Felix Cheung
>Priority: Major
>
> The logical successor to https://issues.apache.org/jira/browse/SPARK-15799
> I don't see anything specifically tracking re-release in the Jira list. It 
> would be helpful to have an issue tracking this to refer to as an outsider, 
> as well as to document what the blockers are in case some outside help could 
> be useful.
>  * Is there a plan to re-release SparkR to CRAN?
>  * What are the major blockers to doing so at the moment?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26858) Vectorized gapplyCollect, Arrow optimization in native R function execution

2019-02-18 Thread Felix Cheung (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16771494#comment-16771494
 ] 

Felix Cheung commented on SPARK-26858:
--

If I understand, this is the case where Spark actually doesn't care much about 
the schema but sounds like Arrow does.

Could we infer the schema from R data.frame? Is there an equivalent way for 
Python Pandas to Arrow?

> Vectorized gapplyCollect, Arrow optimization in native R function execution
> ---
>
> Key: SPARK-26858
> URL: https://issues.apache.org/jira/browse/SPARK-26858
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR, SQL
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>
> Unlike gapply, gapplyCollect requires additional ser/de steps because it can 
> omit the schema, and Spark SQL doesn't know the return type before actually 
> execution happens.
> In original code path, it's done via using binary schema. Once gapply is done 
> (SPARK-26761). we can mimic this approach in vectorized gapply to support 
> gapplyCollect.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26855) SparkSubmitSuite fails on a clean build

2019-02-13 Thread Felix Cheung (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16767935#comment-16767935
 ] 

Felix Cheung commented on SPARK-26855:
--

possibly. sounds like there are more cases like this, and not just R

> SparkSubmitSuite fails on a clean build
> ---
>
> Key: SPARK-26855
> URL: https://issues.apache.org/jira/browse/SPARK-26855
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SparkR
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Felix Cheung
>Priority: Major
>
> SparkSubmitSuite
> "include an external JAR in SparkR"
> fails consistently but the test before it, "correctly builds R packages 
> included in a jar with --packages" passes.
> the workaround is to build once with skipTests first, then everything passes.
> ran into this while testing 2.3.3 RC2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26855) SparkSubmitSuite fails on a clean build

2019-02-11 Thread Felix Cheung (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16765759#comment-16765759
 ] 

Felix Cheung commented on SPARK-26855:
--

IMO we have two options:
 # document tests only pass after a clean build with skipTests
 # re-order tests. suppose test A depends on module B is built, we could move 
test A to be after B (or rather, simply be the test of B)

> SparkSubmitSuite fails on a clean build
> ---
>
> Key: SPARK-26855
> URL: https://issues.apache.org/jira/browse/SPARK-26855
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SparkR
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Felix Cheung
>Priority: Major
>
> SparkSubmitSuite
> "include an external JAR in SparkR"
> fails consistently but the test before it, "correctly builds R packages 
> included in a jar with --packages" passes.
> the workaround is to build once with skipTests first, then everything passes.
> ran into this while testing 2.3.3 RC2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-26855) SparkSubmitSuite fails on a clean build

2019-02-10 Thread Felix Cheung (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-26855:
-
Description: 
SparkSubmitSuite

"include an external JAR in SparkR"

fails consistently but the test before it, "correctly builds R packages 
included in a jar with --packages" passes.

the workaround is to build once with skipTests first, then everything passes.

ran into this while testing 2.3.3 RC2.

> SparkSubmitSuite fails on a clean build
> ---
>
> Key: SPARK-26855
> URL: https://issues.apache.org/jira/browse/SPARK-26855
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SparkR
>Affects Versions: 2.3.2
>Reporter: Felix Cheung
>Priority: Major
>
> SparkSubmitSuite
> "include an external JAR in SparkR"
> fails consistently but the test before it, "correctly builds R packages 
> included in a jar with --packages" passes.
> the workaround is to build once with skipTests first, then everything passes.
> ran into this while testing 2.3.3 RC2.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-26855) SparkSubmitSuite fails on a clean build

2019-02-10 Thread Felix Cheung (JIRA)

Felix Cheung created SPARK-26855:


 Summary: SparkSubmitSuite fails on a clean build
 Key: SPARK-26855
 URL: https://issues.apache.org/jira/browse/SPARK-26855
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, SparkR
Affects Versions: 2.3.2
Reporter: Felix Cheung






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26762) Arrow optimization for conversion from Spark DataFrame to R DataFrame

2019-02-08 Thread Felix Cheung (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764092#comment-16764092
 ] 

Felix Cheung commented on SPARK-26762:
--

does this include head, take etc?

> Arrow optimization for conversion from Spark DataFrame to R DataFrame
> -
>
> Key: SPARK-26762
> URL: https://issues.apache.org/jira/browse/SPARK-26762
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR, SQL
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Like SPARK-25981, {{collect(rdf)}} can be optimized via Arrow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-26762) Arrow optimization for conversion from Spark DataFrame to R DataFrame

2019-02-08 Thread Felix Cheung (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16764092#comment-16764092
 ] 

Felix Cheung edited comment on SPARK-26762 at 2/9/19 7:35 AM:
--

does this include head, take, show etc?


was (Author: felixcheung):
does this include head, take etc?

> Arrow optimization for conversion from Spark DataFrame to R DataFrame
> -
>
> Key: SPARK-26762
> URL: https://issues.apache.org/jira/browse/SPARK-26762
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR, SQL
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Like SPARK-25981, {{collect(rdf)}} can be optimized via Arrow.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-26603) Update minikube backend in K8s integration tests

2019-02-03 Thread Felix Cheung (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung reassigned SPARK-26603:


Assignee: Stavros Kontopoulos

> Update minikube backend in K8s integration tests
> 
>
> Key: SPARK-26603
> URL: https://issues.apache.org/jira/browse/SPARK-26603
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Stavros Kontopoulos
>Assignee: Stavros Kontopoulos
>Priority: Major
>
> Minikube status command has changed 
> ([https://github.com/kubernetes/minikube/commit/cb3624dd089e7ab0c03fbfb81f20c2bde43a60f3#diff-bd0534bbb0703b4170d467d074373788])
>  in the latest releases >0.30.
> Old output:
> {quote}minikube status
>  There is a newer version of minikube available (v0.31.0). Download it here:
>  [https://github.com/kubernetes/minikube/releases/tag/v0.31.0]
> To disable this notification, run the following:
>  minikube config set WantUpdateNotification false
>  minikube: 
>  cluster: 
>  kubectl: 
> {quote}
> new output:
> {quote}minikube status
>  host: Running
>  kubelet: Running
>  apiserver: Running
>  kubectl: Correctly Configured: pointing to minikube-vm at 172.31.34.77
> {quote}
> That means users with latest version of minikube will not be able to run the 
> integration tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-26603) Update minikube backend in K8s integration tests

2019-02-03 Thread Felix Cheung (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung resolved SPARK-26603.
--
   Resolution: Fixed
Fix Version/s: 3.0.0

> Update minikube backend in K8s integration tests
> 
>
> Key: SPARK-26603
> URL: https://issues.apache.org/jira/browse/SPARK-26603
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Stavros Kontopoulos
>Assignee: Stavros Kontopoulos
>Priority: Major
> Fix For: 3.0.0
>
>
> Minikube status command has changed 
> ([https://github.com/kubernetes/minikube/commit/cb3624dd089e7ab0c03fbfb81f20c2bde43a60f3#diff-bd0534bbb0703b4170d467d074373788])
>  in the latest releases >0.30.
> Old output:
> {quote}minikube status
>  There is a newer version of minikube available (v0.31.0). Download it here:
>  [https://github.com/kubernetes/minikube/releases/tag/v0.31.0]
> To disable this notification, run the following:
>  minikube config set WantUpdateNotification false
>  minikube: 
>  cluster: 
>  kubectl: 
> {quote}
> new output:
> {quote}minikube status
>  host: Running
>  kubelet: Running
>  apiserver: Running
>  kubectl: Correctly Configured: pointing to minikube-vm at 172.31.34.77
> {quote}
> That means users with latest version of minikube will not be able to run the 
> integration tests.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24615) Accelerator-aware task scheduling for Spark

2019-01-23 Thread Felix Cheung (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-24615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750751#comment-16750751
 ] 

Felix Cheung commented on SPARK-24615:
--

We are interested to know as well.

 

[~mengxr] touched on this maybe Oct/Nov 2018, but I haven't heard anything else 
since.

> Accelerator-aware task scheduling for Spark
> ---
>
> Key: SPARK-24615
> URL: https://issues.apache.org/jira/browse/SPARK-24615
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Saisai Shao
>Priority: Major
>  Labels: Hydrogen, SPIP
>
> In the machine learning area, accelerator card (GPU, FPGA, TPU) is 
> predominant compared to CPUs. To make the current Spark architecture to work 
> with accelerator cards, Spark itself should understand the existence of 
> accelerators and know how to schedule task onto the executors where 
> accelerators are equipped.
> Current Spark’s scheduler schedules tasks based on the locality of the data 
> plus the available of CPUs. This will introduce some problems when scheduling 
> tasks with accelerators required.
>  # CPU cores are usually more than accelerators on one node, using CPU cores 
> to schedule accelerator required tasks will introduce the mismatch.
>  # In one cluster, we always assume that CPU is equipped in each node, but 
> this is not true of accelerator cards.
>  # The existence of heterogeneous tasks (accelerator required or not) 
> requires scheduler to schedule tasks with a smart way.
> So here propose to improve the current scheduler to support heterogeneous 
> tasks (accelerator requires or not). This can be part of the work of Project 
> hydrogen.
> Details is attached in google doc. It doesn't cover all the implementation 
> details, just highlight the parts should be changed.
>  
> CC [~yanboliang] [~merlintang]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26679) Deconflict spark.executor.pyspark.memory and spark.python.worker.memory

2019-01-23 Thread Felix Cheung (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16750696#comment-16750696
 ] 

Felix Cheung commented on SPARK-26679:
--

I find the fraction configs very confusing and there are much misinformation 
about it and wrong hardcoded config. anyway. ...

> Deconflict spark.executor.pyspark.memory and spark.python.worker.memory
> ---
>
> Key: SPARK-26679
> URL: https://issues.apache.org/jira/browse/SPARK-26679
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 2.4.0
>Reporter: Ryan Blue
>Priority: Major
>
> In 2.4.0, spark.executor.pyspark.memory was added to limit the total memory 
> space of a python worker. There is another RDD setting, 
> spark.python.worker.memory that controls when Spark decides to spill data to 
> disk. These are currently similar, but not related to one another.
> PySpark should probably use spark.executor.pyspark.memory to limit or default 
> the setting of spark.python.worker.memory because the latter property 
> controls spilling and should be lower than the total memory limit. Renaming 
> spark.python.worker.memory would also help clarity because it sounds like it 
> should control the limit, but is more like the JVM setting 
> spark.memory.fraction.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-26642) Add --num-executors option to spark-submit for Spark on K8S

2019-01-20 Thread Felix Cheung (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung reassigned SPARK-26642:


Assignee: Luca Canali

> Add --num-executors option to spark-submit for Spark on K8S
> ---
>
> Key: SPARK-26642
> URL: https://issues.apache.org/jira/browse/SPARK-26642
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Spark Core
>Affects Versions: 3.0.0
>Reporter: Luca Canali
>Assignee: Luca Canali
>Priority: Trivial
>
> Currently spark-submit supports the option --num-executors NUM only for Spark 
> on YARN. Users running Spark on K8S can specify the requested number of 
> executors in spark-submit with --conf spark.executor.instances=NUM
> This proposes to extend the spark-submit option --num-executors to be 
> applicable to Spark on K8S too. It is motivated by convenience, for example 
> when migrating jobs written for YARN to run on K8S.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-26642) Add --num-executors option to spark-submit for Spark on K8S

2019-01-20 Thread Felix Cheung (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung resolved SPARK-26642.
--
   Resolution: Fixed
Fix Version/s: 3.0.0

> Add --num-executors option to spark-submit for Spark on K8S
> ---
>
> Key: SPARK-26642
> URL: https://issues.apache.org/jira/browse/SPARK-26642
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Spark Core
>Affects Versions: 3.0.0
>Reporter: Luca Canali
>Assignee: Luca Canali
>Priority: Trivial
> Fix For: 3.0.0
>
>
> Currently spark-submit supports the option --num-executors NUM only for Spark 
> on YARN. Users running Spark on K8S can specify the requested number of 
> executors in spark-submit with --conf spark.executor.instances=NUM
> This proposes to extend the spark-submit option --num-executors to be 
> applicable to Spark on K8S too. It is motivated by convenience, for example 
> when migrating jobs written for YARN to run on K8S.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26565) modify dev/create-release/release-build.sh to let jenkins build packages w/o publishing

2019-01-14 Thread Felix Cheung (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742845#comment-16742845
 ] 

Felix Cheung commented on SPARK-26565:
--

ok AFAIK

the first `cannot stat` is from this line

 
{code:java}
# Move R source package to match the Spark release version if the versions are 
not the same.
# NOTE(shivaram): `mv` throws an error on Linux if source and destination are 
same file
if [ "$R_PACKAGE_VERSION" != "$VERSION" ]; then
mv "$SPARK_HOME/R/SparkR_$R_PACKAGE_VERSION.tar.gz" 
"$SPARK_HOME/R/SparkR_$VERSION.tar.gz"
fi
{code}
 

which should rename the tarball to SparkR_3.0.0-SNAPSHOT.tar.gz

(maybe it wasn't built?)

the second is from
{code:java}
# Remove the python distribution from dist/ if we built it
if [ "$MAKE_PIP" == "true" ]; then
rm -f "$DISTDIR"/python/dist/pyspark-*.tar.gz
fi

{code}
which is fine, it isn't the output we are looking for  (not under dist)

 

 

> modify dev/create-release/release-build.sh to let jenkins build packages w/o 
> publishing
> ---
>
> Key: SPARK-26565
> URL: https://issues.apache.org/jira/browse/SPARK-26565
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.2.3, 2.3.3, 2.4.1, 3.0.0
>Reporter: shane knapp
>Assignee: shane knapp
>Priority: Major
> Attachments: fine.png, no-idea.jpg
>
>
> about a year+ ago, we stopped publishing releases directly from jenkins...
> this means that the spark-\{branch}-packaging builds are failing due to gpg 
> signing failures, and i would like to update these builds to *just* perform 
> packaging.
> example:
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-package/2183/console]
> i propose to change dev/create-release/release-build.sh...
> when the script is called w/the 'package' option, add an {{if}} statement to 
> skip the following sections when run on jenkins:
> 1) gpg signing of the source tarball (lines 184-187)
> 2) gpg signing of the sparkR dist (lines 243-248)
> 3) gpg signing of the python dist (lines 256-261)
> 4) gpg signing of the regular binary dist (lines 264-271)
> 5) the svn push of the signed dists (lines 317-332)
>  
> -another, and probably much better option, is to nuke the 
> spark-\{branch}-packaging builds and create new ones that just build things 
> w/o touching this incredible fragile shell scripting nightmare.-



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-26120) Fix a streaming query leak in Structured Streaming R tests

2019-01-12 Thread Felix Cheung (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-26120:
-
Fix Version/s: 2.3.3

> Fix a streaming query leak in Structured Streaming R tests
> --
>
> Key: SPARK-26120
> URL: https://issues.apache.org/jira/browse/SPARK-26120
> Project: Spark
>  Issue Type: Test
>  Components: SparkR, Structured Streaming, Tests
>Affects Versions: 2.4.0
>Reporter: Shixiong Zhu
>Assignee: Shixiong Zhu
>Priority: Minor
> Fix For: 2.3.3, 2.4.1, 3.0.0
>
>
> "Specify a schema by using a DDL-formatted string when reading" doesn't stop 
> the streaming query before stopping Spark. It causes the following annoying 
> logs.
> {code}
> Exception in thread "stream execution thread for [id = 
> 186dad10-e87f-4155-8119-00e0e63bbc1a, runId = 
> 2c0cc158-410b-442f-ac36-20f80ec429b1]" Exception in thread "stream execution 
> thread for people3 [id = ffa6136d-fe7b-4777-aa47-b0cb64d07ea4, runId = 
> 644b888e-9cce-4a09-bb5e-2fb122796c19]" org.apache.spark.SparkException: 
> Exception thrown in awaitResult: 
>   at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:355)
>   at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
>   at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:92)
>   at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:76)
>   at 
> org.apache.spark.sql.execution.streaming.state.StateStoreCoordinatorRef.deactivateInstances(StateStoreCoordinator.scala:108)
>   at 
> org.apache.spark.sql.streaming.StreamingQueryManager.notifyQueryTermination(StreamingQueryManager.scala:399)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runStream$2.apply(StreamExecution.scala:342)
>   at 
> org.apache.spark.util.UninterruptibleThread.runUninterruptibly(UninterruptibleThread.scala:77)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:323)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:204)
> Caused by: org.apache.spark.rpc.RpcEnvStoppedException: RpcEnv already 
> stopped.
>   at 
> org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:158)
>   at 
> org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:135)
>   at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:229)
>   at 
> org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:523)
>   at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:91)
>   ... 7 more
> org.apache.spark.SparkException: Exception thrown in awaitResult: 
>   at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:355)
>   at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
>   at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:92)
>   at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:76)
>   at 
> org.apache.spark.sql.execution.streaming.state.StateStoreCoordinatorRef.deactivateInstances(StateStoreCoordinator.scala:108)
>   at 
> org.apache.spark.sql.streaming.StreamingQueryManager.notifyQueryTermination(StreamingQueryManager.scala:399)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runStream$2.apply(StreamExecution.scala:342)
>   at 
> org.apache.spark.util.UninterruptibleThread.runUninterruptibly(UninterruptibleThread.scala:77)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:323)
>   at 
> org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:204)
> Caused by: org.apache.spark.rpc.RpcEnvStoppedException: RpcEnv already 
> stopped.
>   at 
> org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:158)
>   at 
> org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:135)
>   at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:229)
>   at 
> org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:523)
>   at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:91)
>   ... 7 more
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-26565) modify dev/create-release/release-build.sh to let jenkins build packages w/o publishing

2019-01-12 Thread Felix Cheung (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-26565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16741453#comment-16741453
 ] 

Felix Cheung commented on SPARK-26565:
--

Yeah, my point wasn’t to allow access to unsigned release but to help RM to 
check out built packages before kicking off the RC process before release.

For example, often times the build completes successfully but there are some 
issue with the content.


> modify dev/create-release/release-build.sh to let jenkins build packages w/o 
> publishing
> ---
>
> Key: SPARK-26565
> URL: https://issues.apache.org/jira/browse/SPARK-26565
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.2.3, 2.3.3, 2.4.1, 3.0.0
>Reporter: shane knapp
>Assignee: shane knapp
>Priority: Major
> Attachments: fine.png, no-idea.jpg
>
>
> about a year+ ago, we stopped publishing releases directly from jenkins...
> this means that the spark-\{branch}-packaging builds are failing due to gpg 
> signing failures, and i would like to update these builds to *just* perform 
> packaging.
> example:
> [https://amplab.cs.berkeley.edu/jenkins/view/Spark%20Packaging/job/spark-master-package/2183/console]
> i propose to change dev/create-release/release-build.sh...
> when the script is called w/the 'package' option, add an {{if}} statement to 
> skip the following sections when run on jenkins:
> 1) gpg signing of the source tarball (lines 184-187)
> 2) gpg signing of the sparkR dist (lines 243-248)
> 3) gpg signing of the python dist (lines 256-261)
> 4) gpg signing of the regular binary dist (lines 264-271)
> 5) the svn push of the signed dists (lines 317-332)
>  
> -another, and probably much better option, is to nuke the 
> spark-\{branch}-packaging builds and create new ones that just build things 
> w/o touching this incredible fragile shell scripting nightmare.-



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-25572) SparkR tests failed on CRAN on Java 10

2019-01-12 Thread Felix Cheung (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-25572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-25572:
-
Fix Version/s: 2.3.3

> SparkR tests failed on CRAN on Java 10
> --
>
> Key: SPARK-25572
> URL: https://issues.apache.org/jira/browse/SPARK-25572
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.4.0
>Reporter: Felix Cheung
>Assignee: Felix Cheung
>Priority: Major
> Fix For: 2.3.3, 2.4.0
>
>
> follow up to SPARK-24255
> from 2.3.2 release we can see that CRAN doesn't seem to respect the system 
> requirements as running tests - we have seen cases where SparkR is run on 
> Java 10, which unfortunately Spark does not start on. For 2.4.x, lets attempt 
> skipping all tests



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-26010) SparkR vignette fails on CRAN on Java 11

2019-01-12 Thread Felix Cheung (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-26010:
-
Fix Version/s: 2.3.3

> SparkR vignette fails on CRAN on Java 11
> 
>
> Key: SPARK-26010
> URL: https://issues.apache.org/jira/browse/SPARK-26010
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Felix Cheung
>Assignee: Felix Cheung
>Priority: Major
> Fix For: 2.3.3, 2.4.1, 3.0.0
>
>
> follow up to SPARK-25572
> but for vignettes
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-26247) SPIP - ML Model Extension for no-Spark MLLib Online Serving

2018-12-02 Thread Felix Cheung (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-26247:
-
Description: 
This ticket tracks an SPIP to improve model load time and model serving 
interfaces for online serving of Spark MLlib models.  The SPIP is here

[https://docs.google.com/a/uber.com/document/d/e/2PACX-1vRttVNNMBt4pBU2oBWKoiK3-7PW6RDwvHNgSMqO67ilxTX_WUStJ2ysUdAk5Im08eyHvlpcfq1g-DLF/pub]

 

The improvement opportunity exists in all versions of spark.  We developed our 
set of changes wrt version 2.1.0 and can port them forward to other versions 
(e.g., we have ported them forward to 2.3.2).

  was:
This ticket tracks an SPIP to improve model load time and model serving 
interfaces for online serving of Spark MLlib models.  The SPIP is here

[https://docs.google.com/a/uber.com/document/d/e/2PACX-1vRttVNNMBt4pBU2oBWKoiK3-7PW6RDwvHNgSMqO67ilxTX_WUStJ2ysUdAk5Im08eyHvlpcfq1g-DLF/pub]

 

The improvement opportunity exists in all versions of spark.  We developed our 
set of changes wrt version 2.1.0 and can port them forward to other versions 
(e.g., wehave ported them forward to 2.3.2).


> SPIP - ML Model Extension for no-Spark MLLib Online Serving
> ---
>
> Key: SPARK-26247
> URL: https://issues.apache.org/jira/browse/SPARK-26247
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 2.1.0
>Reporter: Anne Holler
>Priority: Major
>  Labels: SPIP
>
> This ticket tracks an SPIP to improve model load time and model serving 
> interfaces for online serving of Spark MLlib models.  The SPIP is here
> [https://docs.google.com/a/uber.com/document/d/e/2PACX-1vRttVNNMBt4pBU2oBWKoiK3-7PW6RDwvHNgSMqO67ilxTX_WUStJ2ysUdAk5Im08eyHvlpcfq1g-DLF/pub]
>  
> The improvement opportunity exists in all versions of spark.  We developed 
> our set of changes wrt version 2.1.0 and can port them forward to other 
> versions (e.g., we have ported them forward to 2.3.2).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-26247) SPIP - ML Model Extension for no-Spark MLLib Online Serving

2018-12-02 Thread Felix Cheung (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-26247:
-
Target Version/s: 3.0.0  (was: 2.1.0)

> SPIP - ML Model Extension for no-Spark MLLib Online Serving
> ---
>
> Key: SPARK-26247
> URL: https://issues.apache.org/jira/browse/SPARK-26247
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 2.1.0
>Reporter: Anne Holler
>Priority: Major
>  Labels: SPIP
>
> This ticket tracks an SPIP to improve model load time and model serving 
> interfaces for online serving of Spark MLlib models.  The SPIP is here
> [https://docs.google.com/a/uber.com/document/d/e/2PACX-1vRttVNNMBt4pBU2oBWKoiK3-7PW6RDwvHNgSMqO67ilxTX_WUStJ2ysUdAk5Im08eyHvlpcfq1g-DLF/pub]
>  
> The improvement opportunity exists in all versions of spark.  We developed 
> our set of changes wrt version 2.1.0 and can port them forward to other 
> versions (e.g., wehave ported them forward to 2.3.2).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-26247) SPIP - ML Model Extension for no-Spark MLLib Online Serving

2018-12-02 Thread Felix Cheung (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-26247:
-
Fix Version/s: (was: 2.1.0)

> SPIP - ML Model Extension for no-Spark MLLib Online Serving
> ---
>
> Key: SPARK-26247
> URL: https://issues.apache.org/jira/browse/SPARK-26247
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 2.1.0
>Reporter: Anne Holler
>Priority: Major
>  Labels: SPIP
>
> This ticket tracks an SPIP to improve model load time and model serving 
> interfaces for online serving of Spark MLlib models.  The SPIP is here
> [https://docs.google.com/a/uber.com/document/d/e/2PACX-1vRttVNNMBt4pBU2oBWKoiK3-7PW6RDwvHNgSMqO67ilxTX_WUStJ2ysUdAk5Im08eyHvlpcfq1g-DLF/pub]
>  
> The improvement opportunity exists in all versions of spark.  We developed 
> our set of changes wrt version 2.1.0 and can port them forward to other 
> versions (e.g., wehave ported them forward to 2.3.2).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-26189) Fix the doc of unionAll in SparkR

2018-11-30 Thread Felix Cheung (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung resolved SPARK-26189.
--
  Resolution: Fixed
Assignee: Huaxin Gao
   Fix Version/s: 3.0.0
Target Version/s: 3.0.0

> Fix the doc of unionAll in SparkR
> -
>
> Key: SPARK-26189
> URL: https://issues.apache.org/jira/browse/SPARK-26189
> Project: Spark
>  Issue Type: Documentation
>  Components: R
>Affects Versions: 3.0.0
>Reporter: Xiao Li
>Assignee: Huaxin Gao
>Priority: Minor
> Fix For: 3.0.0
>
>
> We should fix the doc of unionAll in SparkR. See the discussion: 
> https://github.com/apache/spark/pull/23131/files#r236760822



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21291) R bucketBy partitionBy API

2018-11-28 Thread Felix Cheung (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16701505#comment-16701505
 ] 

Felix Cheung commented on SPARK-21291:
--

hmm, ok

> R bucketBy partitionBy API
> --
>
> Key: SPARK-21291
> URL: https://issues.apache.org/jira/browse/SPARK-21291
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Affects Versions: 2.2.0
>Reporter: Felix Cheung
>Assignee: Huaxin Gao
>Priority: Major
> Fix For: 3.0.0
>
>
> partitionBy exists but it's for windowspec only



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21291) R bucketBy partitionBy API

2018-11-27 Thread Felix Cheung (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16700743#comment-16700743
 ] 

Felix Cheung commented on SPARK-21291:
--

I think we need to reopen this Jira since bucketBy is not addressed.

> R bucketBy partitionBy API
> --
>
> Key: SPARK-21291
> URL: https://issues.apache.org/jira/browse/SPARK-21291
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Affects Versions: 2.2.0
>Reporter: Felix Cheung
>Assignee: Huaxin Gao
>Priority: Major
> Fix For: 3.0.0
>
>
> partitionBy exists but it's for windowspec only



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24255) Require Java 8 in SparkR description

2018-11-12 Thread Felix Cheung (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-24255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16684668#comment-16684668
 ] 

Felix Cheung commented on SPARK-24255:
--

[~shivaram] I'm thinking if this is handling all version cases?

*[kiszk|https://github.com/kiszk]*  found this with java version
{code:java}
$ ../OpenJDK-8/java -version
java version "1.8.0_162"
Java(TM) SE Runtime Environment (build 1.8.0_162-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.162-b12, mixed mode)

$ ../OpenJDK-8/java Version
jave.specification.version=1.8
jave.version=1.8.0_162
jave.version.split(".")[0]=1

$ ../OpenJDK-9/java -version
openjdk version "9"
OpenJDK Runtime Environment (build 9+181)
OpenJDK 64-Bit Server VM (build 9+181, mixed mode)

$ ../OpenJDK-9/java Version
jave.specification.version=9
jave.version=9
jave.version.split(".")[0]=9

$ ../OpenJDK-11/java -version
openjdk version "11.0.1" 2018-10-16
OpenJDK Runtime Environment 18.9 (build 11.0.1+13)
OpenJDK 64-Bit Server VM 18.9 (build 11.0.1+13, mixed mode)

$ ../OpenJDK-11/java Version
jave.specification.version=11
jave.version=11.0.1
jave.version.split(".")[0]=11


$ ../OpenJ9-8/java -version
openjdk version "1.8.0_192"
OpenJDK Runtime Environment (build 1.8.0_192-b12)
Eclipse OpenJ9 VM (build openj9-0.11.0, JRE 1.8.0 Windows 10 amd64-64-Bit 
Compressed References 20181019_105 (JIT enabled, AOT enabled)
OpenJ9   - 090ff9dc
OMR  - ea548a66
JCL  - 51609250b5 based on jdk8u192-b12)

$ ../OpenJ9-8/java Version
jave.specification.version=1.8
jave.version=1.8.0_192
jave.version.split(".")[0]=1

$ ../OpenJ9-9/java -version
openjdk version "9.0.4-adoptopenjdk"
OpenJDK Runtime Environment (build 9.0.4-adoptopenjdk+12)
Eclipse OpenJ9 VM (build openj9-0.9.0, JRE 9 Windows 8.1 amd64-64-Bit 
Compressed References 20180814_161 (JIT enabled, AOT enabled)
OpenJ9   - 24e53631
OMR  - fad6bf6e
JCL  - feec4d2ae based on jdk-9.0.4+12)

$ ../OpenJ9-9/java Version
jave.specification.version=9
jave.version=9.0.4-adoptopenjdk
jave.version.split(".")[0]=9


$ ../OpenJ9-11/java -version
openjdk version "11.0.1" 2018-10-16
OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.1+13)
Eclipse OpenJ9 VM AdoptOpenJDK (build openj9-0.11.0, JRE 11 Windows 10 
amd64-64-Bit Compressed References 20181020_83 (JIT enabled, AOT enabled)
OpenJ9   - 090ff9dc
OMR  - ea548a66
JCL  - f62696f378 based on jdk-11.0.1+13)

$ ../OpenJ9-11/java Version
jave.specification.version=11
jave.version=11.0.1
jave.version.split(".")[0]=11


$ ../IBMJDK-8/java -version
java version "1.8.0"
Java(TM) SE Runtime Environment (build pwa6480-20150129_02)
IBM J9 VM (build 2.8, JRE 1.8.0 Windows 8.1 amd64-64 Compressed References 
20150116_231420 (JIT enabled, AOT enabled)
J9VM - R28_Java8_GA_20150116_2030_B231420
JIT  - tr.r14.java_20150109_82886.02
GC   - R28_Java8_GA_20150116_2030_B231420_CMPRSS
J9CL - 20150116_231420)
JCL - 20150123_01 based on Oracle jdk8u31-b12

$ ../IBMJDK-8/java Version
jave.specification.version=1.8
jave.version=1.8.0
jave.version.split(".")[0]=1
{code}

> Require Java 8 in SparkR description
> 
>
> Key: SPARK-24255
> URL: https://issues.apache.org/jira/browse/SPARK-24255
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.3.0
>Reporter: Shivaram Venkataraman
>Assignee: Shivaram Venkataraman
>Priority: Major
> Fix For: 2.3.1, 2.4.0
>
>
> CRAN checks require that the Java version be set both in package description 
> and checked during runtime.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-26010) SparkR vignette fails on CRAN on Java 11

2018-11-12 Thread Felix Cheung (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung resolved SPARK-26010.
--
   Resolution: Fixed
 Assignee: Felix Cheung
Fix Version/s: 3.0.0
   2.4.1

> SparkR vignette fails on CRAN on Java 11
> 
>
> Key: SPARK-26010
> URL: https://issues.apache.org/jira/browse/SPARK-26010
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Felix Cheung
>Assignee: Felix Cheung
>Priority: Major
> Fix For: 2.4.1, 3.0.0
>
>
> follow up to SPARK-25572
> but for vignettes
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-26010) SparkR vignette fails on Java 11

2018-11-11 Thread Felix Cheung (JIRA)

Felix Cheung created SPARK-26010:


 Summary: SparkR vignette fails on Java 11
 Key: SPARK-26010
 URL: https://issues.apache.org/jira/browse/SPARK-26010
 Project: Spark
  Issue Type: Bug
  Components: SparkR
Affects Versions: 2.4.0, 3.0.0
Reporter: Felix Cheung


follow up to SPARK-25572

but for vignettes

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-26010) SparkR vignette fails on CRAN on Java 11

2018-11-11 Thread Felix Cheung (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-26010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-26010:
-
Summary: SparkR vignette fails on CRAN on Java 11  (was: SparkR vignette 
fails on Java 11)

> SparkR vignette fails on CRAN on Java 11
> 
>
> Key: SPARK-26010
> URL: https://issues.apache.org/jira/browse/SPARK-26010
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.4.0, 3.0.0
>Reporter: Felix Cheung
>Priority: Major
>
> follow up to SPARK-25572
> but for vignettes
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25995) sparkR should ensure user args are after the argument used for the port

2018-11-10 Thread Felix Cheung (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-25995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16682762#comment-16682762
 ] 

Felix Cheung commented on SPARK-25995:
--

sparkR is just taking the whole string as-is

[https://github.com/apache/spark/blob/141953f4c44dbad1c2a7059e92bec5fe770af932/R/pkg/R/client.R#L59]

you see sparkSubmitOpts is before args (args is the file with port number)

I think we should avoid duplicating the submit arg parsing in R, which we would 
need to break before 
{code:java}
fooarg
{code}
?

Is it easier/better to always set the temp file with port as the last arg 
instead?

 

 

> sparkR should ensure user args are after the argument used for the port
> ---
>
> Key: SPARK-25995
> URL: https://issues.apache.org/jira/browse/SPARK-25995
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.3.2
>Reporter: Thomas Graves
>Priority: Minor
>
> Currently if you run sparkR and accidentally specify an argument, it fails 
> with a useless error message.  For example:
> $SPARK_HOME/bin/sparkR  --master yarn --deploy-mode client fooarg
> This gets turned into:
> Launching java with spark-submit command spark-submit   "--master" "yarn" 
> "--deploy-mode" "client" "sparkr-shell" "fooarg" 
> /tmp/Rtmp6XBGz2/backend_port162806ea36bca
> Notice that "fooarg" got put before /tmp file which is how R and jvm know 
> which port to connect to.  SparkR eventually fails with timeout exception 
> after 10 seconds.  
>  
> SparkR should either not allow args or make sure the order is correct so the 
> backend_port is always first. see 
> https://github.com/apache/spark/blob/master/R/pkg/R/sparkR.R#L129



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25923) SparkR UT Failure (checking CRAN incoming feasibility)

2018-11-03 Thread Felix Cheung (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-25923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16674216#comment-16674216
 ] 

Felix Cheung commented on SPARK-25923:
--

thanks - what's the exchange required with CRAN admin?

> SparkR UT Failure (checking CRAN incoming feasibility)
> --
>
> Key: SPARK-25923
> URL: https://issues.apache.org/jira/browse/SPARK-25923
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Liang-Chi Hsieh
>Priority: Blocker
>
> Currently, the following SparkR error blocks PR builders.
> {code:java}
> * checking CRAN incoming feasibility ...Error in 
> .check_package_CRAN_incoming(pkgdir) : 
>   dims [product 26] do not match the length of object [0]
> Execution halted
> {code}
> - 
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98362/console
> - 
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98367/console
> - 
> https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98368/testReport/
> - 
> https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4403/testReport/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-25859) add scala/java/python example and doc for PrefixSpan

2018-10-27 Thread Felix Cheung (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-25859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung resolved SPARK-25859.
--
  Resolution: Fixed
Assignee: Huaxin Gao
   Fix Version/s: 2.4.0
Target Version/s: 2.4.0

> add scala/java/python example and doc for PrefixSpan
> 
>
> Key: SPARK-25859
> URL: https://issues.apache.org/jira/browse/SPARK-25859
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 2.4.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Major
> Fix For: 2.4.0
>
>
> scala/java/python examples and doc for PrefixSpan are added in 3.0 in 
> https://issues.apache.org/jira/browse/SPARK-24207. This jira is to add the 
> examples and doc in 2.4.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-16693) Remove R deprecated methods

2018-10-27 Thread Felix Cheung (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-16693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung resolved SPARK-16693.
--
   Resolution: Fixed
 Assignee: Felix Cheung
Fix Version/s: 3.0.0

> Remove R deprecated methods
> ---
>
> Key: SPARK-16693
> URL: https://issues.apache.org/jira/browse/SPARK-16693
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.0.0
>Reporter: Felix Cheung
>Assignee: Felix Cheung
>Priority: Major
> Fix For: 3.0.0
>
>
> For methods deprecated in Spark 2.0.0, we should remove them in 2.1.0 -> 3.0.0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12172) Consider removing SparkR internal RDD APIs

2018-10-26 Thread Felix Cheung (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-12172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16665908#comment-16665908
 ] 

Felix Cheung commented on SPARK-12172:
--

sounds good

> Consider removing SparkR internal RDD APIs
> --
>
> Key: SPARK-12172
> URL: https://issues.apache.org/jira/browse/SPARK-12172
> Project: Spark
>  Issue Type: Task
>  Components: SparkR
>Reporter: Felix Cheung
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-15545) R remove non-exported unused methods, like jsonRDD

2018-10-25 Thread Felix Cheung (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-15545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung resolved SPARK-15545.
--
Resolution: Duplicate

> R remove non-exported unused methods, like jsonRDD
> --
>
> Key: SPARK-15545
> URL: https://issues.apache.org/jira/browse/SPARK-15545
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.3.2
>Reporter: Felix Cheung
>Priority: Minor
>
> Need to review what should be removed.
> one reason to not remove this right away is because we have been talking 
> about calling internal methods via `SparkR:::jsonRDD` for this and other RDD 
> methods.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-15545) R remove non-exported unused methods, like jsonRDD

2018-10-25 Thread Felix Cheung (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-15545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-15545:
-
Affects Version/s: 2.3.2
External issue ID: SPARK-12172

> R remove non-exported unused methods, like jsonRDD
> --
>
> Key: SPARK-15545
> URL: https://issues.apache.org/jira/browse/SPARK-15545
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.3.2
>Reporter: Felix Cheung
>Priority: Minor
>
> Need to review what should be removed.
> one reason to not remove this right away is because we have been talking 
> about calling internal methods via `SparkR:::jsonRDD` for this and other RDD 
> methods.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-12172) Consider removing SparkR internal RDD APIs

2018-10-25 Thread Felix Cheung (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-12172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664631#comment-16664631
 ] 

Felix Cheung edited comment on SPARK-12172 at 10/26/18 4:11 AM:


ok, what's our option for spark.lapply?

I'll consider at least removing all other methods that are not used for 
spark.lapply in spark 3.0.0


was (Author: felixcheung):
ok, what's our option for spark.lapply?

> Consider removing SparkR internal RDD APIs
> --
>
> Key: SPARK-12172
> URL: https://issues.apache.org/jira/browse/SPARK-12172
> Project: Spark
>  Issue Type: Task
>  Components: SparkR
>Reporter: Felix Cheung
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12172) Consider removing SparkR internal RDD APIs

2018-10-25 Thread Felix Cheung (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-12172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664631#comment-16664631
 ] 

Felix Cheung commented on SPARK-12172:
--

ok, what's our option for spark.lapply?

> Consider removing SparkR internal RDD APIs
> --
>
> Key: SPARK-12172
> URL: https://issues.apache.org/jira/browse/SPARK-12172
> Project: Spark
>  Issue Type: Task
>  Components: SparkR
>Reporter: Felix Cheung
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16611) Expose several hidden DataFrame/RDD functions

2018-10-25 Thread Felix Cheung (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-16611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664628#comment-16664628
 ] 

Felix Cheung commented on SPARK-16611:
--

ping - we are going to consider removing RDD methods in spark 3.0.0

> Expose several hidden DataFrame/RDD functions
> -
>
> Key: SPARK-16611
> URL: https://issues.apache.org/jira/browse/SPARK-16611
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Reporter: Oscar D. Lara Yejas
>Priority: Major
>
> Expose the following functions:
> - lapply or map
> - lapplyPartition or mapPartition
> - flatMap
> - RDD
> - toRDD
> - getJRDD
> - cleanup.jobj
> cc:
> [~javierluraschi] [~j...@rstudio.com] [~shivaram]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16611) Expose several hidden DataFrame/RDD functions

2018-10-25 Thread Felix Cheung (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-16611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664630#comment-16664630
 ] 

Felix Cheung commented on SPARK-16611:
--

see SPARK-12172

> Expose several hidden DataFrame/RDD functions
> -
>
> Key: SPARK-16611
> URL: https://issues.apache.org/jira/browse/SPARK-16611
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Reporter: Oscar D. Lara Yejas
>Priority: Major
>
> Expose the following functions:
> - lapply or map
> - lapplyPartition or mapPartition
> - flatMap
> - RDD
> - toRDD
> - getJRDD
> - cleanup.jobj
> cc:
> [~javierluraschi] [~j...@rstudio.com] [~shivaram]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-16693) Remove R deprecated methods

2018-10-25 Thread Felix Cheung (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-16693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16664626#comment-16664626
 ] 

Felix Cheung commented on SPARK-16693:
--

rebuilt this on spark 3.0.0

> Remove R deprecated methods
> ---
>
> Key: SPARK-16693
> URL: https://issues.apache.org/jira/browse/SPARK-16693
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.0.0
>Reporter: Felix Cheung
>Priority: Major
>
> For methods deprecated in Spark 2.0.0, we should remove them in 2.1.0 -> 3.0.0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-16693) Remove R deprecated methods

2018-10-25 Thread Felix Cheung (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-16693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-16693:
-
Description: For methods deprecated in Spark 2.0.0, we should remove them 
in 2.1.0 -> 3.0.0  (was: For methods deprecated in Spark 2.0.0, we should 
remove them in 2.1.0)

> Remove R deprecated methods
> ---
>
> Key: SPARK-16693
> URL: https://issues.apache.org/jira/browse/SPARK-16693
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.0.0
>Reporter: Felix Cheung
>Priority: Major
>
> For methods deprecated in Spark 2.0.0, we should remove them in 2.1.0 -> 3.0.0



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-24572) "eager execution" for R shell, IDE

2018-10-25 Thread Felix Cheung (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-24572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung resolved SPARK-24572.
--
  Resolution: Fixed
Assignee: Weiqiang Zhuang
   Fix Version/s: 3.0.0
Target Version/s: 3.0.0

> "eager execution" for R shell, IDE
> --
>
> Key: SPARK-24572
> URL: https://issues.apache.org/jira/browse/SPARK-24572
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.4.0
>Reporter: Felix Cheung
>Assignee: Weiqiang Zhuang
>Priority: Major
> Fix For: 3.0.0
>
>
> like python in SPARK-24215
> we could also have eager execution when SparkDataFrame is returned to the R 
> shell



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-24516) PySpark Bindings for K8S - make Python 3 the default

2018-10-25 Thread Felix Cheung (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-24516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung resolved SPARK-24516.
--
  Resolution: Fixed
Assignee: Ilan Filonenko
   Fix Version/s: 3.0.0
Target Version/s: 3.0.0

> PySpark Bindings for K8S - make Python 3 the default
> 
>
> Key: SPARK-24516
> URL: https://issues.apache.org/jira/browse/SPARK-24516
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, PySpark
>Affects Versions: 2.4.0
>Reporter: Ondrej Kokes
>Assignee: Ilan Filonenko
>Priority: Minor
> Fix For: 3.0.0
>
>
> Initial PySpark-k8s bindings have just been resolved (SPARK-23984), but the 
> default Python version there is 2. While you can override this by setting it 
> to 3, I think we should have sensible defaults.
> Python 3 has been around for ten years and is the clear successor, Python 2 
> has only 18 months left in terms of support. There isn't a good reason to 
> suggest Python 2 should be used, not in 2018 and not when both versions are 
> supported.
> The relevant commit [is 
> here|https://github.com/apache/spark/commit/1a644afbac35c204f9ad55f86999319a9ab458c6#diff-6e882d5561424e7e6651eb46f10104b8R194],
>  the version is also [in the 
> documentation|https://github.com/apache/spark/commit/1a644afbac35c204f9ad55f86999319a9ab458c6#diff-b5527f236b253e0d9f5db5164bdb43e9R643].
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-22947) SPIP: as-of join in Spark SQL

2018-10-21 Thread Felix Cheung (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-22947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658377#comment-16658377
 ] 

Felix Cheung edited comment on SPARK-22947 at 10/21/18 8:53 PM:


so what's our take on this? it seems quite useful for time series analysis 
which would be quite important for us.

first, seems like there is a question of the syntax - either AS OF or streaming 
INTERVAL or some sort of join hint

second, how the optimizer can figure this out.

perhaps the first is not a strict prereq for the second, but they are closely 
related. are we considering splitting this proposal into two and focus on 
getting optimizer figuring this out first, perhaps?


was (Author: felixcheung):
so what's our take on this? it seems quite useful for time series analysis 
which would be quite important for us.

first, seems like there is a question of the syntax - either AS OF or streaming 
INTERVAL or some sort of join hint

second, how the optimizer can figure this out.

perhaps the first is not a strict prereq for the second, but they are closely 
related

> SPIP: as-of join in Spark SQL
> -
>
> Key: SPARK-22947
> URL: https://issues.apache.org/jira/browse/SPARK-22947
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.1
>Reporter: Li Jin
>Priority: Major
> Attachments: SPIP_ as-of join in Spark SQL (1).pdf
>
>
> h2. Background and Motivation
> Time series analysis is one of the most common analysis on financial data. In 
> time series analysis, as-of join is a very common operation. Supporting as-of 
> join in Spark SQL will allow many use cases of using Spark SQL for time 
> series analysis.
> As-of join is “join on time” with inexact time matching criteria. Various 
> library has implemented asof join or similar functionality:
> Kdb: https://code.kx.com/wiki/Reference/aj
> Pandas: 
> http://pandas.pydata.org/pandas-docs/version/0.19.0/merging.html#merging-merge-asof
> R: This functionality is called “Last Observation Carried Forward”
> https://www.rdocumentation.org/packages/zoo/versions/1.8-0/topics/na.locf
> JuliaDB: http://juliadb.org/latest/api/joins.html#IndexedTables.asofjoin
> Flint: https://github.com/twosigma/flint#temporal-join-functions
> This proposal advocates introducing new API in Spark SQL to support as-of 
> join.
> h2. Target Personas
> Data scientists, data engineers
> h2. Goals
> * New API in Spark SQL that allows as-of join
> * As-of join of multiple table (>2) should be performant, because it’s very 
> common that users need to join multiple data sources together for further 
> analysis.
> * Define Distribution, Partitioning and shuffle strategy for ordered time 
> series data
> h2. Non-Goals
> These are out of scope for the existing SPIP, should be considered in future 
> SPIP as improvement to Spark’s time series analysis ability:
> * Utilize partition information from data source, i.e, begin/end of each 
> partition to reduce sorting/shuffling
> * Define API for user to implement asof join time spec in business calendar 
> (i.e. lookback one business day, this is very common in financial data 
> analysis because of market calendars)
> * Support broadcast join
> h2. Proposed API Changes
> h3. TimeContext
> TimeContext is an object that defines the time scope of the analysis, it has 
> begin time (inclusive) and end time (exclusive). User should be able to 
> change the time scope of the analysis (i.e, from one month to five year) by 
> just changing the TimeContext. 
> To Spark engine, TimeContext is a hint that:
> can be used to repartition data for join
> serve as a predicate that can be pushed down to storage layer
> Time context is similar to filtering time by begin/end, the main difference 
> is that time context can be expanded based on the operation taken (see 
> example in as-of join).
> Time context example:
> {code:java}
> TimeContext timeContext = TimeContext("20160101", "20170101")
> {code}
> h3. asofJoin
> h4. User Case A (join without key)
> Join two DataFrames on time, with one day lookback:
> {code:java}
> TimeContext timeContext = TimeContext("20160101", "20170101")
> dfA = ...
> dfB = ...
> JoinSpec joinSpec = JoinSpec(timeContext).on("time").tolerance("-1day")
> result = dfA.asofJoin(dfB, joinSpec)
> {code}
> Example input/output:
> {code:java}
> dfA:
> time, quantity
> 20160101, 100
> 20160102, 50
> 20160104, -50
> 20160105, 100
> dfB:
> time, price
> 20151231, 100.0
> 20160104, 105.0
> 20160105, 102.0
> output:
> time, quantity, price
> 20160101, 100, 100.0
> 20160102, 50, null
> 20160104, -50, 105.0
> 20160105, 100, 102.0
> {code}
> Note row (20160101, 100) of dfA is joined with (20151231, 100.0) of dfB. This 
> is an important illustration of the time context - it is able to expand the 
> context to

[jira] [Commented] (SPARK-22947) SPIP: as-of join in Spark SQL

2018-10-21 Thread Felix Cheung (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-22947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658377#comment-16658377
 ] 

Felix Cheung commented on SPARK-22947:
--

so what's our take on this? it seems quite useful for time series analysis 
which would be quite important for us.

first, seems like there is a question of the syntax - either AS OF or streaming 
INTERVAL or some sort of join hint

second, how the optimizer can figure this out.

perhaps the first is not a strict prereq for the second, but they are closely 
related

> SPIP: as-of join in Spark SQL
> -
>
> Key: SPARK-22947
> URL: https://issues.apache.org/jira/browse/SPARK-22947
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.2.1
>Reporter: Li Jin
>Priority: Major
> Attachments: SPIP_ as-of join in Spark SQL (1).pdf
>
>
> h2. Background and Motivation
> Time series analysis is one of the most common analysis on financial data. In 
> time series analysis, as-of join is a very common operation. Supporting as-of 
> join in Spark SQL will allow many use cases of using Spark SQL for time 
> series analysis.
> As-of join is “join on time” with inexact time matching criteria. Various 
> library has implemented asof join or similar functionality:
> Kdb: https://code.kx.com/wiki/Reference/aj
> Pandas: 
> http://pandas.pydata.org/pandas-docs/version/0.19.0/merging.html#merging-merge-asof
> R: This functionality is called “Last Observation Carried Forward”
> https://www.rdocumentation.org/packages/zoo/versions/1.8-0/topics/na.locf
> JuliaDB: http://juliadb.org/latest/api/joins.html#IndexedTables.asofjoin
> Flint: https://github.com/twosigma/flint#temporal-join-functions
> This proposal advocates introducing new API in Spark SQL to support as-of 
> join.
> h2. Target Personas
> Data scientists, data engineers
> h2. Goals
> * New API in Spark SQL that allows as-of join
> * As-of join of multiple table (>2) should be performant, because it’s very 
> common that users need to join multiple data sources together for further 
> analysis.
> * Define Distribution, Partitioning and shuffle strategy for ordered time 
> series data
> h2. Non-Goals
> These are out of scope for the existing SPIP, should be considered in future 
> SPIP as improvement to Spark’s time series analysis ability:
> * Utilize partition information from data source, i.e, begin/end of each 
> partition to reduce sorting/shuffling
> * Define API for user to implement asof join time spec in business calendar 
> (i.e. lookback one business day, this is very common in financial data 
> analysis because of market calendars)
> * Support broadcast join
> h2. Proposed API Changes
> h3. TimeContext
> TimeContext is an object that defines the time scope of the analysis, it has 
> begin time (inclusive) and end time (exclusive). User should be able to 
> change the time scope of the analysis (i.e, from one month to five year) by 
> just changing the TimeContext. 
> To Spark engine, TimeContext is a hint that:
> can be used to repartition data for join
> serve as a predicate that can be pushed down to storage layer
> Time context is similar to filtering time by begin/end, the main difference 
> is that time context can be expanded based on the operation taken (see 
> example in as-of join).
> Time context example:
> {code:java}
> TimeContext timeContext = TimeContext("20160101", "20170101")
> {code}
> h3. asofJoin
> h4. User Case A (join without key)
> Join two DataFrames on time, with one day lookback:
> {code:java}
> TimeContext timeContext = TimeContext("20160101", "20170101")
> dfA = ...
> dfB = ...
> JoinSpec joinSpec = JoinSpec(timeContext).on("time").tolerance("-1day")
> result = dfA.asofJoin(dfB, joinSpec)
> {code}
> Example input/output:
> {code:java}
> dfA:
> time, quantity
> 20160101, 100
> 20160102, 50
> 20160104, -50
> 20160105, 100
> dfB:
> time, price
> 20151231, 100.0
> 20160104, 105.0
> 20160105, 102.0
> output:
> time, quantity, price
> 20160101, 100, 100.0
> 20160102, 50, null
> 20160104, -50, 105.0
> 20160105, 100, 102.0
> {code}
> Note row (20160101, 100) of dfA is joined with (20151231, 100.0) of dfB. This 
> is an important illustration of the time context - it is able to expand the 
> context to 20151231 on dfB because of the 1 day lookback.
> h4. Use Case B (join with key)
> To join on time and another key (for instance, id), we use “by” to specify 
> the key.
> {code:java}
> TimeContext timeContext = TimeContext("20160101", "20170101")
> dfA = ...
> dfB = ...
> JoinSpec joinSpec = 
> JoinSpec(timeContext).on("time").by("id").tolerance("-1day")
> result = dfA.asofJoin(dfB, joinSpec)
> {code}
> Example input/output:
> {code:java}
> dfA:
> time, id, quantity
> 20160101, 1, 100
> 20160101, 2, 50
> 20160102, 1, -50
> 20160102, 2, 50
> dfB:
> time, id,

[jira] [Resolved] (SPARK-24207) PrefixSpan: R API

2018-10-21 Thread Felix Cheung (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-24207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung resolved SPARK-24207.
--
   Resolution: Fixed
Fix Version/s: 3.0.0

> PrefixSpan: R API
> -
>
> Key: SPARK-24207
> URL: https://issues.apache.org/jira/browse/SPARK-24207
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 2.4.0
>Reporter: Felix Cheung
>Assignee: Huaxin Gao
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-24207) PrefixSpan: R API

2018-10-21 Thread Felix Cheung (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-24207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung reassigned SPARK-24207:


Assignee: Huaxin Gao

> PrefixSpan: R API
> -
>
> Key: SPARK-24207
> URL: https://issues.apache.org/jira/browse/SPARK-24207
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Affects Versions: 2.4.0
>Reporter: Felix Cheung
>Assignee: Huaxin Gao
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25634) New Metrics in External Shuffle Service to help identify abusing application

2018-10-21 Thread Felix Cheung (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-25634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16658343#comment-16658343
 ] 

Felix Cheung commented on SPARK-25634:
--

how about off-heap and netty buffer usage?

> New Metrics in External Shuffle Service to help identify abusing application
> 
>
> Key: SPARK-25634
> URL: https://issues.apache.org/jira/browse/SPARK-25634
> Project: Spark
>  Issue Type: Improvement
>  Components: Shuffle
>Affects Versions: 2.4.0
>Reporter: Ye Zhou
>Priority: Minor
>
> We run Spark on YARN, and deploy Spark external shuffle service as part of 
> YARN NM aux service. External Shuffle Service is shared by all Spark 
> applications. SPARK-24355 enables the threads reservation to handle 
> non-ChunkFetchRequest. SPARK-21501 limits the memory usage for Guava Cache to 
> avoid OOM in shuffle service which could crash NodeManager. But still some 
> application may generate a large amount of shuffle blocks which could heavily 
> decrease the performance on some shuffle servers. When this abusing behavior 
> happens, it might further decreases the overall performance for other 
> applications if they happen to use the same shuffle servers. We have been 
> seeing issues like this in our cluster, but there is no way for us to figure 
> out which application is abusing shuffle service.
> SPARK-18364 has enabled expose out shuffle service metrics to Hadoop Metrics 
> System. It is better if we can have the following metrics and also metrics 
> divided by applicationID:
> 1. *shuffle server on-heap memory consumption for caching shuffle indexes*
> 2. *breakdown of shuffle indexes caching memory consumption by local 
> executors*
> We can generate metrics when 
> ExternalShuffleBlockHandler-->getSortBasedShuffleBlockData, which will 
> trigger the Cache load. We can roughly be able to get the metrics from the 
> shuffleindexfile size when putting into the cache and moved out from the 
> cache.
> 3. *shuffle server load for shuffle block fetch requests*
> 4. *breakdown of shuffle server block fetch requests load by remote executors*
> We can generate metrics in ExternalShuffleBlockHandler-->handleMessage when a 
> new OpenBlocks message is received.
> Open discussion for more metrics that could potentially influence the overall 
> shuffle service performance. 
> We can print out those metrics which are divided by applicationIDs in log, 
> since it is hard to define fixed key and use numerical value for this kind of 
> metrics. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-25675) [Spark Job History] Job UI page does not show pagination with one page

2018-10-21 Thread Felix Cheung (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-25675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung resolved SPARK-25675.
--
   Resolution: Fixed
Fix Version/s: 3.0.0

> [Spark Job History] Job UI page does not show pagination with one page
> --
>
> Key: SPARK-25675
> URL: https://issues.apache.org/jira/browse/SPARK-25675
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.1
>Reporter: ABHISHEK KUMAR GUPTA
>Assignee: Shivu Sondur
>Priority: Major
> Fix For: 3.0.0
>
>
> 1. set spark.ui.retainedJobs= 1 in spark-default conf of spark Job History
>  2. Restart Job History
>  3. Submit Beeline jobs for 1
>  4. Launch Job History UI Page
>  5. Select JDBC Running Application ID from Incomplete Application Page
>  6. Launch Jo Page
>  7. Pagination Panel display based on page size as below
>  
> 
>  Completed Jobs XXX
>  Page: 1 2 3 ... XX Page: Jump to 1 show 100 items in a 
> page
>  
> -
>  8. Change the value in Jump to 1 show *XXX* items in page, that is display 
> all completed Jobs in a single page
> *Actual Result:*
>  All completed Jobs will be display in a Page but no Pagination panel so that 
> User can modify and set the number of Jobs in a page.
> *Expected Result:*
>  It should display the Pagination panel as below
>  >>>
>  Page: 1                                                             1 Page: 
> Jump to 1 show *XXX* items in a page
>  
>  Pagination of page size *1* because it is displaying total number of 
> completed Jobs in a single Page.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-25675) [Spark Job History] Job UI page does not show pagination with one page

2018-10-21 Thread Felix Cheung (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-25675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung reassigned SPARK-25675:


Assignee: Shivu Sondur

> [Spark Job History] Job UI page does not show pagination with one page
> --
>
> Key: SPARK-25675
> URL: https://issues.apache.org/jira/browse/SPARK-25675
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.3.1
>Reporter: ABHISHEK KUMAR GUPTA
>Assignee: Shivu Sondur
>Priority: Major
> Fix For: 3.0.0
>
>
> 1. set spark.ui.retainedJobs= 1 in spark-default conf of spark Job History
>  2. Restart Job History
>  3. Submit Beeline jobs for 1
>  4. Launch Job History UI Page
>  5. Select JDBC Running Application ID from Incomplete Application Page
>  6. Launch Jo Page
>  7. Pagination Panel display based on page size as below
>  
> 
>  Completed Jobs XXX
>  Page: 1 2 3 ... XX Page: Jump to 1 show 100 items in a 
> page
>  
> -
>  8. Change the value in Jump to 1 show *XXX* items in page, that is display 
> all completed Jobs in a single page
> *Actual Result:*
>  All completed Jobs will be display in a Page but no Pagination panel so that 
> User can modify and set the number of Jobs in a page.
> *Expected Result:*
>  It should display the Pagination panel as below
>  >>>
>  Page: 1                                                             1 Page: 
> Jump to 1 show *XXX* items in a page
>  
>  Pagination of page size *1* because it is displaying total number of 
> completed Jobs in a single Page.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-25730) Kubernetes scheduler tries to read pod details that it just deleted

2018-10-21 Thread Felix Cheung (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-25730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-25730:
-
Affects Version/s: (was: 2.5.0)

> Kubernetes scheduler tries to read pod details that it just deleted
> ---
>
> Key: SPARK-25730
> URL: https://issues.apache.org/jira/browse/SPARK-25730
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Mike Kaplinskiy
>Assignee: Mike Kaplinskiy
>Priority: Major
> Fix For: 3.0.0
>
>
> See [https://github.com/apache/spark/pull/22720/files] for the fix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-25730) Kubernetes scheduler tries to read pod details that it just deleted

2018-10-21 Thread Felix Cheung (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-25730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung reassigned SPARK-25730:


Assignee: Mike Kaplinskiy

> Kubernetes scheduler tries to read pod details that it just deleted
> ---
>
> Key: SPARK-25730
> URL: https://issues.apache.org/jira/browse/SPARK-25730
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Mike Kaplinskiy
>Assignee: Mike Kaplinskiy
>Priority: Major
> Fix For: 3.0.0
>
>
> See [https://github.com/apache/spark/pull/22720/files] for the fix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-25730) Kubernetes scheduler tries to read pod details that it just deleted

2018-10-21 Thread Felix Cheung (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-25730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung resolved SPARK-25730.
--
   Resolution: Fixed
Fix Version/s: 3.0.0

> Kubernetes scheduler tries to read pod details that it just deleted
> ---
>
> Key: SPARK-25730
> URL: https://issues.apache.org/jira/browse/SPARK-25730
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Mike Kaplinskiy
>Assignee: Mike Kaplinskiy
>Priority: Major
> Fix For: 3.0.0
>
>
> See [https://github.com/apache/spark/pull/22720/files] for the fix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-25572) SparkR tests failed on CRAN on Java 10

2018-09-29 Thread Felix Cheung (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-25572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-25572:
-
Description: 
follow up to SPARK-24255

from 2.3.2 release we can see that CRAN doesn't seem to respect the system 
requirements as running tests - we have seen cases where SparkR is run on Java 
10, which unfortunately Spark does not start on. For 2.4.x, lets attempt 
skipping all tests

  was:
follow up to SPARK-24255

from 2.3.2 release we can see that CRAN doesn't seem to respect the system 
requirements as running tests - we have seen cases where SparkR is run on Java 
10, which unfortunately Spark does not start on. For 2.4, lets attempt skipping 
all tests


> SparkR tests failed on CRAN on Java 10
> --
>
> Key: SPARK-25572
> URL: https://issues.apache.org/jira/browse/SPARK-25572
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.4.0
>Reporter: Felix Cheung
>Assignee: Felix Cheung
>Priority: Major
> Fix For: 2.4.1, 2.5.0
>
>
> follow up to SPARK-24255
> from 2.3.2 release we can see that CRAN doesn't seem to respect the system 
> requirements as running tests - we have seen cases where SparkR is run on 
> Java 10, which unfortunately Spark does not start on. For 2.4.x, lets attempt 
> skipping all tests



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25572) SparkR tests failed on CRAN on Java 10

2018-09-29 Thread Felix Cheung (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-25572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16633129#comment-16633129
 ] 

Felix Cheung commented on SPARK-25572:
--

[~cloud_fan] while not a blocker, it would be great to include in 2.4.0 if we 
have another RC

> SparkR tests failed on CRAN on Java 10
> --
>
> Key: SPARK-25572
> URL: https://issues.apache.org/jira/browse/SPARK-25572
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.4.0
>Reporter: Felix Cheung
>Assignee: Felix Cheung
>Priority: Major
> Fix For: 2.4.1, 2.5.0
>
>
> follow up to SPARK-24255
> from 2.3.2 release we can see that CRAN doesn't seem to respect the system 
> requirements as running tests - we have seen cases where SparkR is run on 
> Java 10, which unfortunately Spark does not start on. For 2.4, lets attempt 
> skipping all tests



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-25572) SparkR tests failed on CRAN on Java 10

2018-09-29 Thread Felix Cheung (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-25572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16633130#comment-16633130
 ] 

Felix Cheung commented on SPARK-25572:
--

[~shivaram]

> SparkR tests failed on CRAN on Java 10
> --
>
> Key: SPARK-25572
> URL: https://issues.apache.org/jira/browse/SPARK-25572
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.4.0
>Reporter: Felix Cheung
>Assignee: Felix Cheung
>Priority: Major
> Fix For: 2.4.1, 2.5.0
>
>
> follow up to SPARK-24255
> from 2.3.2 release we can see that CRAN doesn't seem to respect the system 
> requirements as running tests - we have seen cases where SparkR is run on 
> Java 10, which unfortunately Spark does not start on. For 2.4, lets attempt 
> skipping all tests



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-25572) SparkR tests failed on CRAN on Java 10

2018-09-29 Thread Felix Cheung (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-25572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung resolved SPARK-25572.
--
  Resolution: Fixed
   Fix Version/s: 2.5.0
  2.4.1
Target Version/s: 2.4.1, 2.5.0

> SparkR tests failed on CRAN on Java 10
> --
>
> Key: SPARK-25572
> URL: https://issues.apache.org/jira/browse/SPARK-25572
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.4.0
>Reporter: Felix Cheung
>Assignee: Felix Cheung
>Priority: Major
> Fix For: 2.4.1, 2.5.0
>
>
> follow up to SPARK-24255
> from 2.3.2 release we can see that CRAN doesn't seem to respect the system 
> requirements as running tests - we have seen cases where SparkR is run on 
> Java 10, which unfortunately Spark does not start on. For 2.4, lets attempt 
> skipping all tests



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-25572) SparkR tests failed on CRAN on Java 10

2018-09-28 Thread Felix Cheung (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-25572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-25572:
-
Summary: SparkR tests failed on CRAN on Java 10  (was: SparkR to skip tests 
because Java 10)

> SparkR tests failed on CRAN on Java 10
> --
>
> Key: SPARK-25572
> URL: https://issues.apache.org/jira/browse/SPARK-25572
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.4.0
>Reporter: Felix Cheung
>Assignee: Felix Cheung
>Priority: Major
>
> follow up to SPARK-24255
> from 2.3.2 release we can see that CRAN doesn't seem to respect the system 
> requirements as running tests - we have seen cases where SparkR is run on 
> Java 10, which unfortunately Spark does not start on. For 2.4, lets attempt 
> skipping all tests



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-25572) SparkR to skip tests because Java 10

2018-09-28 Thread Felix Cheung (JIRA)

Felix Cheung created SPARK-25572:


 Summary: SparkR to skip tests because Java 10
 Key: SPARK-25572
 URL: https://issues.apache.org/jira/browse/SPARK-25572
 Project: Spark
  Issue Type: Bug
  Components: SparkR
Affects Versions: 2.4.0
Reporter: Felix Cheung
Assignee: Felix Cheung


follow up to SPARK-24255

from 2.3.2 release we can see that CRAN doesn't seem to respect the system 
requirements as running tests - we have seen cases where SparkR is run on Java 
10, which unfortunately Spark does not start on. For 2.4, lets attempt skipping 
all tests



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21291) R bucketBy partitionBy API

2018-09-27 Thread Felix Cheung (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16630410#comment-16630410
 ] 

Felix Cheung commented on SPARK-21291:
--

Wait. I don’t think saveAsTable is the same thing?




> R bucketBy partitionBy API
> --
>
> Key: SPARK-21291
> URL: https://issues.apache.org/jira/browse/SPARK-21291
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Affects Versions: 2.2.0
>Reporter: Felix Cheung
>Assignee: Huaxin Gao
>Priority: Major
> Fix For: 2.5.0
>
>
> partitionBy exists but it's for windowspec only



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21291) R bucketBy partitionBy API

2018-09-26 Thread Felix Cheung (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628721#comment-16628721
 ] 

Felix Cheung commented on SPARK-21291:
--

The PR did not have bucketBy?




> R bucketBy partitionBy API
> --
>
> Key: SPARK-21291
> URL: https://issues.apache.org/jira/browse/SPARK-21291
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Affects Versions: 2.2.0
>Reporter: Felix Cheung
>Assignee: Huaxin Gao
>Priority: Major
> Fix For: 2.5.0
>
>
> partitionBy exists but it's for windowspec only



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-23648) extend hint syntax to support any expression for R

2018-09-19 Thread Felix Cheung (JIRA)



 [ 
https://issues.apache.org/jira/browse/SPARK-23648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung resolved SPARK-23648.
--
   Resolution: Fixed
 Assignee: Huaxin Gao
Fix Version/s: 2.5.0

> extend hint syntax to support any expression for R
> --
>
> Key: SPARK-23648
> URL: https://issues.apache.org/jira/browse/SPARK-23648
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR, SQL
>Affects Versions: 2.2.0, 2.3.0
>Reporter: Dylan Guedes
>Assignee: Huaxin Gao
>Priority: Major
> Fix For: 2.5.0
>
>
> Relax checks in
> [https://github.com/apache/spark/blob/7f203a248f94df6183a4bc4642a3d873171fef29/R/pkg/R/DataFrame.R#L3746]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24572) "eager execution" for R shell, IDE

2018-09-17 Thread Felix Cheung (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-24572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16617120#comment-16617120
 ] 

Felix Cheung commented on SPARK-24572:
--

thanks! very close -  showDF doesn't return anything so we should refactor this 
slight as 

 
{code:java}
setMethod("show", "SparkDataFrame",
  function(object) {
if (identical(sparkR.conf("spark.sql.repl.eagerEval.enabled", 
"false")[[1]], "true")) {
  showDF(object)
} else {
  cols <- lapply(dtypes(object), function(l) {
paste(l, collapse = ":")
  })
  s <- paste(cols, collapse = ", ")
  cat(paste(class(object), "[", s, "]\n", sep = ""))
}
  })

{code}
 

> "eager execution" for R shell, IDE
> --
>
> Key: SPARK-24572
> URL: https://issues.apache.org/jira/browse/SPARK-24572
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.4.0
>Reporter: Felix Cheung
>Priority: Major
>
> like python in SPARK-24215
> we could also have eager execution when SparkDataFrame is returned to the R 
> shell



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-21291) R bucketBy partitionBy API

2018-09-17 Thread Felix Cheung (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16617118#comment-16617118
 ] 

Felix Cheung edited comment on SPARK-21291 at 9/17/18 6:07 AM:
---

I think it should be like this one

[https://spark.apache.org/docs/latest/api/R/write.stream.html]

but added to write.df()


was (Author: felixcheung):
I think it should be like this one

 

[https://spark.apache.org/docs/latest/api/R/write.stream.html]

 

> R bucketBy partitionBy API
> --
>
> Key: SPARK-21291
> URL: https://issues.apache.org/jira/browse/SPARK-21291
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Affects Versions: 2.2.0
>Reporter: Felix Cheung
>Priority: Major
>
> partitionBy exists but it's for windowspec only



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21291) R bucketBy partitionBy API

2018-09-17 Thread Felix Cheung (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16617118#comment-16617118
 ] 

Felix Cheung commented on SPARK-21291:
--

I think it should be like this one

 

[https://spark.apache.org/docs/latest/api/R/write.stream.html]

 

> R bucketBy partitionBy API
> --
>
> Key: SPARK-21291
> URL: https://issues.apache.org/jira/browse/SPARK-21291
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Affects Versions: 2.2.0
>Reporter: Felix Cheung
>Priority: Major
>
> partitionBy exists but it's for windowspec only



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-21291) R bucketBy partitionBy API

2018-09-13 Thread Felix Cheung (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-21291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16613994#comment-16613994
 ] 

Felix Cheung commented on SPARK-21291:
--

No, you wouldn’t return a writer in R. I will reply with more details in a few 
days




> R bucketBy partitionBy API
> --
>
> Key: SPARK-21291
> URL: https://issues.apache.org/jira/browse/SPARK-21291
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Affects Versions: 2.2.0
>Reporter: Felix Cheung
>Priority: Major
>
> partitionBy exists but it's for windowspec only



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-23200) Reset configuration when restarting from checkpoints

2018-09-10 Thread Felix Cheung (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-23200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610089#comment-16610089
 ] 

Felix Cheung commented on SPARK-23200:
--

probably need someone to rebuild on the current config names...

> Reset configuration when restarting from checkpoints
> 
>
> Key: SPARK-23200
> URL: https://issues.apache.org/jira/browse/SPARK-23200
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Anirudh Ramanathan
>Priority: Major
>
> Streaming workloads and restarting from checkpoints may need additional 
> changes, i.e. resetting properties -  see 
> https://github.com/apache-spark-on-k8s/spark/pull/516



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-22632) Fix the behavior of timestamp values for R's DataFrame to respect session timezone

2018-09-10 Thread Felix Cheung (JIRA)



[ 
https://issues.apache.org/jira/browse/SPARK-22632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16610083#comment-16610083
 ] 

Felix Cheung commented on SPARK-22632:
--

mismatch between R and JVM time zone could be an issue but not a blocker for 
release. let's move to 3.0

> Fix the behavior of timestamp values for R's DataFrame to respect session 
> timezone
> --
>
> Key: SPARK-22632
> URL: https://issues.apache.org/jira/browse/SPARK-22632
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR, SQL
>Affects Versions: 2.3.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Note: wording is borrowed from SPARK-22395. Symptom is similar and I think 
> that JIRA is well descriptive.
> When converting R's DataFrame from/to Spark DataFrame using 
> {{createDataFrame}} or {{collect}}, timestamp values behave to respect R 
> system timezone instead of session timezone.
> For example, let's say we use "America/Los_Angeles" as session timezone and 
> have a timestamp value "1970-01-01 00:00:01" in the timezone. Btw, I'm in 
> South Korea so R timezone would be "KST".
> The timestamp value from current collect() will be the following:
> {code}
> > sparkR.session(master = "local[*]", sparkConfig = 
> > list(spark.sql.session.timeZone = "America/Los_Angeles"))
> > collect(sql("SELECT cast(cast(28801 as timestamp) as string) as ts"))
>ts
> 1 1970-01-01 00:00:01
> > collect(sql("SELECT cast(28801 as timestamp) as ts"))
>ts
> 1 1970-01-01 17:00:01
> {code}
> As you can see, the value becomes "1970-01-01 17:00:01" because it respects R 
> system timezone.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1468 matches

Mail list logo