[jira] [Commented] (AIRFLOW-198) Create an ShortCircuitIfNotCurrentOperator

2016-08-18 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/AIRFLOW-198?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15426233#comment-15426233
 ] 

Felix Cheung commented on AIRFLOW-198:
--

I'm new but would love to look into this.
Sounds like we still need "Avoid scheduling multiple instances of a task that 
has been marked as only_run_latest and prioritize the most recent execution 
date"

> Create an ShortCircuitIfNotCurrentOperator
> --
>
> Key: AIRFLOW-198
> URL: https://issues.apache.org/jira/browse/AIRFLOW-198
> Project: Apache Airflow
>  Issue Type: New Feature
>Reporter: Peter Attardo
>Assignee: Siddharth Anand
>
> Taken from: https://cwiki.apache.org/confluence/display/AIRFLOW/Roadmap
> "For cases where we need to only run the latest in a series of task instance 
> runs and mark the others as skipped. For example, we may have job to execute 
> a DB snapshot every day. If the DAG is paused for 5 days and then unpaused, 
> we don’t want to run all 5, just the latest. With this feature, we will 
> provide “cron” functionality for task scheduling that is not related to ETL"
> I've decided to implement this as a subclass of ShortCircuitOperator - 
> OnlyRunCurrentOperator/ShortCircuitIfNotCurrentOperator will skip downstream 
> if the dag run being executed is not current.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (BIGTOP-2690) gradlew toolchain fails trying to download Ant 1.9.8

2017-02-19 Thread Felix Cheung (JIRA)
Felix Cheung created BIGTOP-2690:


 Summary: gradlew toolchain fails trying to download Ant 1.9.8
 Key: BIGTOP-2690
 URL: https://issues.apache.org/jira/browse/BIGTOP-2690
 Project: Bigtop
  Issue Type: Bug
  Components: build
Affects Versions: 1.1.0
Reporter: Felix Cheung


toolchain tries to download Ant 1.9.8, but 1.9.8 has been removed from Apache 
mirrors.




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (BIGTOP-2693) Update readme for build and configuring git repo for packages

2017-03-01 Thread Felix Cheung (JIRA)
Felix Cheung created BIGTOP-2693:


 Summary: Update readme for build and configuring git repo for 
packages
 Key: BIGTOP-2693
 URL: https://issues.apache.org/jira/browse/BIGTOP-2693
 Project: Bigtop
  Issue Type: Bug
  Components: build, documentation
Affects Versions: 1.1.0
Reporter: Felix Cheung
Assignee: Felix Cheung
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (FLINK-2691) Broken links to Python script on QuickStart doc

2015-09-16 Thread Felix Cheung (JIRA)
Felix Cheung created FLINK-2691:
---

 Summary: Broken links to Python script on QuickStart doc
 Key: FLINK-2691
 URL: https://issues.apache.org/jira/browse/FLINK-2691
 Project: Flink
  Issue Type: Bug
Reporter: Felix Cheung
Priority: Minor


Links to plotPoints.py are broken on 
https://ci.apache.org/projects/flink/flink-docs-release-0.9/quickstart/run_example_quickstart.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-7482) StringWriter to support compression

2017-08-20 Thread Felix Cheung (JIRA)
Felix Cheung created FLINK-7482:
---

 Summary: StringWriter to support compression
 Key: FLINK-7482
 URL: https://issues.apache.org/jira/browse/FLINK-7482
 Project: Flink
  Issue Type: Bug
  Components: filesystem-connector
Affects Versions: 1.3.2
Reporter: Felix Cheung


Is it possible to have StringWriter support compression like 
AvroKeyValueSinkWriter or SequenceFileWriter?




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (ZEPPELIN-3385) PySpark interpreter should handle .. for autocomplete

2018-04-04 Thread Felix Cheung (JIRA)
Felix Cheung created ZEPPELIN-3385:
--

 Summary: PySpark interpreter should handle .. for autocomplete
 Key: ZEPPELIN-3385
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3385
 Project: Zeppelin
  Issue Type: Bug
  Components: python-interpreter
Reporter: Felix Cheung


See thread here 
[https://github.com/apache/zeppelin/pull/2901#discussion_r178472173]

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZEPPELIN-3468) Undo paragraph change after find/replace erratic

2018-05-16 Thread Felix Cheung (JIRA)
Felix Cheung created ZEPPELIN-3468:
--

 Summary: Undo paragraph change after find/replace erratic
 Key: ZEPPELIN-3468
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3468
 Project: Zeppelin
  Issue Type: Bug
  Components: front-end
Affects Versions: 0.7.3, 0.8.0
Reporter: Felix Cheung


Find/Replace - then Undo behavior seemed erratic

 
 # find some text
 # replace text with something else
 # go to paragraph to try to undo the replace (Ctrl-Z or Cmd-Z)
- replace seemed to be going into block of text (but not always the whole 
paragraph)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZEPPELIN-3469) Confusing behavior when running on Java 9

2018-05-16 Thread Felix Cheung (JIRA)
Felix Cheung created ZEPPELIN-3469:
--

 Summary: Confusing behavior when running on Java 9
 Key: ZEPPELIN-3469
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-3469
 Project: Zeppelin
  Issue Type: Bug
  Components: Interpreters
Reporter: Felix Cheung


confusing error when running on Java 9 
 # start zeppelin on java 9
 # open the sample notebook
 # shift-enter to run (should run the spark interpreter)

 
{code:java}
java.lang.NullPointerException at 
org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:44) at 
org.apache.zeppelin.spark.Utils.invokeMethod(Utils.java:39) at 
org.apache.zeppelin.spark.OldSparkInterpreter.createSparkContext_2(OldSparkInterpreter.java:375)
 at 
org.apache.zeppelin.spark.OldSparkInterpreter.createSparkContext(OldSparkInterpreter.java:364)
 at 
org.apache.zeppelin.spark.OldSparkInterpreter.getSparkContext(OldSparkInterpreter.java:172)
 at 
org.apache.zeppelin.spark.OldSparkInterpreter.open(OldSparkInterpreter.java:740)
 at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:61) 
at 
org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:69)
 at 
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:617)
 at org.apache.zeppelin.scheduler.Job.run(Job.java:188) at 
org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:140) at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:514)
 at
java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at 
java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:299)
 at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
 at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
 at java.base/java.lang.Thread.run(Thread.java:844)
{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (ZEPPELIN-599) notebook search should search paragraph title

2016-01-12 Thread Felix Cheung (JIRA)
Felix Cheung created ZEPPELIN-599:
-

 Summary: notebook search should search paragraph title
 Key: ZEPPELIN-599
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-599
 Project: Zeppelin
  Issue Type: Bug
Reporter: Felix Cheung
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZEPPELIN-600) notebook search should have a way to clear search and return to the previous page

2016-01-12 Thread Felix Cheung (JIRA)
Felix Cheung created ZEPPELIN-600:
-

 Summary: notebook search should have a way to clear search and 
return to the previous page
 Key: ZEPPELIN-600
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-600
 Project: Zeppelin
  Issue Type: Bug
Reporter: Felix Cheung


Like Esc key or perhaps a button to clear search



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZEPPELIN-601) import note, add from url textbox is not clearing buttons from previous choices

2016-01-12 Thread Felix Cheung (JIRA)
Felix Cheung created ZEPPELIN-601:
-

 Summary: import note, add from url textbox is not clearing buttons 
from previous choices
 Key: ZEPPELIN-601
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-601
 Project: Zeppelin
  Issue Type: Bug
Reporter: Felix Cheung
 Attachments: screenshot-1.png

Tested on Chrome



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZEPPELIN-602) elasticsearch throws ArrayIndexOutOfBoundsException for interpreting an empty paragraph

2016-01-12 Thread Felix Cheung (JIRA)
Felix Cheung created ZEPPELIN-602:
-

 Summary: elasticsearch throws ArrayIndexOutOfBoundsException for 
interpreting an empty paragraph
 Key: ZEPPELIN-602
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-602
 Project: Zeppelin
  Issue Type: Bug
Reporter: Felix Cheung
Priority: Minor


To reproduce,

%elasticsearch



Ctrl-enter to run



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZEPPELIN-632) Document steps to contribute a new interpreter

2016-01-25 Thread Felix Cheung (JIRA)
Felix Cheung created ZEPPELIN-632:
-

 Summary: Document steps to contribute a new interpreter
 Key: ZEPPELIN-632
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-632
 Project: Zeppelin
  Issue Type: Bug
Reporter: Felix Cheung
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZEPPELIN-643) Shell interpreter improvements

2016-01-30 Thread Felix Cheung (JIRA)
Felix Cheung created ZEPPELIN-643:
-

 Summary: Shell interpreter improvements
 Key: ZEPPELIN-643
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-643
 Project: Zeppelin
  Issue Type: Bug
  Components: Interpreters
Affects Versions: 0.5.6
Reporter: Felix Cheung
Assignee: Karuppayya
Priority: Minor
 Fix For: 0.6.0


*Provide ability to to run  shell commands in parallel
*Provide ability to cancel shell command
*Propagate the error from shell commands to UI




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZEPPELIN-651) HBase interpreter

2016-02-02 Thread Felix Cheung (JIRA)
Felix Cheung created ZEPPELIN-651:
-

 Summary: HBase interpreter
 Key: ZEPPELIN-651
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-651
 Project: Zeppelin
  Issue Type: Bug
  Components: Interpreters
Reporter: Felix Cheung
 Fix For: 0.6.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZEPPELIN-655) HBase interpreter support for HBase 1.1.x releases

2016-02-04 Thread Felix Cheung (JIRA)
Felix Cheung created ZEPPELIN-655:
-

 Summary: HBase interpreter support for HBase 1.1.x releases
 Key: ZEPPELIN-655
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-655
 Project: Zeppelin
  Issue Type: Bug
  Components: Interpreters
Affects Versions: 0.6.0
Reporter: Felix Cheung






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZEPPELIN-662) HBase interpreter should support CDH favor of HBase

2016-02-06 Thread Felix Cheung (JIRA)
Felix Cheung created ZEPPELIN-662:
-

 Summary: HBase interpreter should support CDH favor of HBase
 Key: ZEPPELIN-662
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-662
 Project: Zeppelin
  Issue Type: Bug
  Components: Interpreters
Affects Versions: 0.6.0
Reporter: Felix Cheung
Assignee: Felix Cheung






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZEPPELIN-665) CI is running all interpreter tests in all 6 jobs in test matrix

2016-02-09 Thread Felix Cheung (JIRA)
Felix Cheung created ZEPPELIN-665:
-

 Summary: CI is running all interpreter tests in all 6 jobs in test 
matrix
 Key: ZEPPELIN-665
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-665
 Project: Zeppelin
  Issue Type: Bug
  Components: build, Interpreters
Affects Versions: 0.6.0
Reporter: Felix Cheung
Assignee: Felix Cheung






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZEPPELIN-704) Display elapse time for long running paragraph

2016-02-27 Thread Felix Cheung (JIRA)
Felix Cheung created ZEPPELIN-704:
-

 Summary: Display elapse time for long running paragraph
 Key: ZEPPELIN-704
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-704
 Project: Zeppelin
  Issue Type: Bug
  Components: GUI
Affects Versions: 0.6.0
Reporter: Felix Cheung
Priority: Minor


For long running paragraph, it is hard to know how long it has already been 
running. We should either have the start time or elapse time displayed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZEPPELIN-767) HBase interpreter does not work with HBase on a remote cluster

2016-03-27 Thread Felix Cheung (JIRA)
Felix Cheung created ZEPPELIN-767:
-

 Summary: HBase interpreter does not work with HBase on a remote 
cluster
 Key: ZEPPELIN-767
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-767
 Project: Zeppelin
  Issue Type: Bug
  Components: Interpreters
Affects Versions: 0.6.0
Reporter: Felix Cheung
Assignee: Felix Cheung


HBase interpreter fails with message "ERROR: KeeperErrorCode = ConnectionLoss 
for /hbase".

Initially it's thought that zkquoram setttings are not getting applied, but 
deeper investigations reveal that hbase-site.xml cannot be loaded.

HBASE_HOME or HBASE_CONF_DIR is set by `hbase` script when running hbase shell 
- interpreter will need to at minimum replicate that behavior to add the 
directory with hbase-site.xml to CLASS_PATH in order to fix this issue.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZEPPELIN-768) HBase interpreter does not work with HBase 1.1.4 (stable) or HBase 1.2.0

2016-03-27 Thread Felix Cheung (JIRA)
Felix Cheung created ZEPPELIN-768:
-

 Summary: HBase interpreter does not work with HBase 1.1.4 (stable) 
or HBase 1.2.0
 Key: ZEPPELIN-768
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-768
 Project: Zeppelin
  Issue Type: Bug
  Components: Interpreters
Affects Versions: 0.6.0
Reporter: Felix Cheung


When run
{code}
%hbase
list
{code}

This error is reported:
{code}
org.jruby.exceptions.RaiseException: (NameError) cannot load Java class 
org.apache.hadoop.hbase.quotas.ThrottleType
at 
org.jruby.javasupport.JavaUtilities.get_proxy_or_package_under_package(org/jruby/javasupport/JavaUtilities.java:54)
at (Anonymous).method_missing(/builtin/javasupport/java.rb:51)
at (Anonymous).(root)(/opt/hbase-1.1.4/lib/ruby/hbase/quotas.rb:23)
at org.jruby.RubyKernel.require(org/jruby/RubyKernel.java:1062)
at (Anonymous).(root)(/opt/hbase-1.1.4/lib/ruby/hbase/quotas.rb:24)
at org.jruby.RubyKernel.require(org/jruby/RubyKernel.java:1062)
at (Anonymous).(root)(/opt/hbase-1.1.4/lib/ruby/hbase/hbase.rb:96)
at org.jruby.RubyKernel.require(org/jruby/RubyKernel.java:1062)
at (Anonymous).(root)(/opt/hbase-1.1.4/lib/ruby/hbase.rb:105)
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZEPPELIN-802) Python variable should be clearly named to avoid being overwritten by user code

2016-04-11 Thread Felix Cheung (JIRA)
Felix Cheung created ZEPPELIN-802:
-

 Summary: Python variable should be clearly named to avoid being 
overwritten by user code
 Key: ZEPPELIN-802
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-802
 Project: Zeppelin
  Issue Type: Bug
Reporter: Felix Cheung
Assignee: Felix Cheung
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZEPPELIN-803) ASF Infra: enable github comment from email reply

2016-04-11 Thread Felix Cheung (JIRA)
Felix Cheung created ZEPPELIN-803:
-

 Summary: ASF Infra: enable github comment from email reply
 Key: ZEPPELIN-803
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-803
 Project: Zeppelin
  Issue Type: Bug
  Components: CI-infra
Reporter: Felix Cheung
Assignee: Felix Cheung
Priority: Minor


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZEPPELIN-827) Apache Gearpump interpreter

2016-04-25 Thread Felix Cheung (JIRA)
Felix Cheung created ZEPPELIN-827:
-

 Summary: Apache Gearpump interpreter
 Key: ZEPPELIN-827
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-827
 Project: Zeppelin
  Issue Type: Bug
  Components: Interpreters
Reporter: Felix Cheung
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZEPPELIN-17) PySpark Interpreter should allow starting with a specific version of Python

2015-03-31 Thread Felix Cheung (JIRA)
Felix Cheung created ZEPPELIN-17:


 Summary: PySpark Interpreter should allow starting with a specific 
version of Python
 Key: ZEPPELIN-17
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-17
 Project: Zeppelin
  Issue Type: Bug
  Components: Interpreters
Reporter: Felix Cheung
Priority: Minor


PySpark Interpreter should allow starting with a specific version of Python, as 
PySpark does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZEPPELIN-17) PySpark Interpreter should allow starting with a specific version of Python

2015-03-31 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/ZEPPELIN-17?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14389377#comment-14389377
 ] 

Felix Cheung commented on ZEPPELIN-17:
--

https://github.com/apache/incubator-zeppelin/pull/19

> PySpark Interpreter should allow starting with a specific version of Python
> ---
>
> Key: ZEPPELIN-17
> URL: https://issues.apache.org/jira/browse/ZEPPELIN-17
> Project: Zeppelin
>  Issue Type: Bug
>  Components: Interpreters
>Reporter: Felix Cheung
>Priority: Minor
>
> PySpark Interpreter should allow starting with a specific version of Python, 
> as PySpark does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ZEPPELIN-10) Zeppelin doesn't compatible with docker.

2015-03-31 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/ZEPPELIN-10?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14389473#comment-14389473
 ] 

Felix Cheung commented on ZEPPELIN-10:
--

I think RJ is saying you could config Docker to port forward.
https://docs.docker.com/articles/networking/

I've been forwarding both 8080 and 8081 ports when running in 
Vagrant/VirtualBox and it's working well.

> Zeppelin doesn't compatible with docker.
> 
>
> Key: ZEPPELIN-10
> URL: https://issues.apache.org/jira/browse/ZEPPELIN-10
> Project: Zeppelin
>  Issue Type: Bug
>  Components: GUI
>Affects Versions: 0.5.0
>Reporter: Egor Pakhomov
>Priority: Critical
>
> Zeppelin put UI on ZEPPELIN_PORT and web_socket on ZEPPELIN_PORT+1. When you 
> run docker with standart -P 
> argument(https://docs.docker.com/userguide/usingdocker/) it maps these ports 
> to some other ports. For example you set  ZEPPELIN_PORT=8080 and export 8080, 
> 8081. But in reality it would be some other ports(49159, 49160). And Zeppelin 
> don't know anything about them. And UI doesn't work properly. 
> Please see https://github.com/epahomov/docker-zeppelin



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZEPPELIN-51) Create official Zeppelin release

2015-04-18 Thread Felix Cheung (JIRA)
Felix Cheung created ZEPPELIN-51:


 Summary: Create official Zeppelin release
 Key: ZEPPELIN-51
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-51
 Project: Zeppelin
  Issue Type: Improvement
Reporter: Felix Cheung


Zeppelin should have official release builds such that users who would like to 
test or try Zeppelin out could easily do so without having to clone/build 
Zeppelin locally.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZEPPELIN-75) PySpark interpreter - useful debugging traceback information is lost for any error from Spark

2015-05-09 Thread Felix Cheung (JIRA)
Felix Cheung created ZEPPELIN-75:


 Summary:  PySpark interpreter - useful debugging traceback 
information is lost for any error from Spark
 Key: ZEPPELIN-75
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-75
 Project: Zeppelin
  Issue Type: Bug
  Components: Interpreters
Affects Versions: 0.5.0
Reporter: Felix Cheung
Assignee: Felix Cheung


When there is an error from Spark, the original error is not returned as output 
in the cell, instead a generic Py4JError is shown.

(, Py4JJavaError(u'An error occurred while 
calling o45.collect.\n', JavaObject id=o46), )

While it is possible to look at zeppelin-interpreter-spark-root-node.log, it 
might not be accessible for a multi-user environment as it will require remote 
access to the host running Zeppelin.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZEPPELIN-143) Need github integration

2015-06-29 Thread Felix Cheung (JIRA)
Felix Cheung created ZEPPELIN-143:
-

 Summary: Need github integration
 Key: ZEPPELIN-143
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-143
 Project: Zeppelin
  Issue Type: Improvement
  Components: Core
Affects Versions: 0.5.0, 0.6.0
Reporter: Felix Cheung
Priority: Minor


It would be nice if Zeppelin notebook can be committed to git, possibly to 
github.

It may be possible to leverage the new/pending persistence extension layer for 
this, though it suspect it would require finer-grain control for git since 
unlike a file system, one would likely not want to git commit for every single 
line changes.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZEPPELIN-185) z.show does not work on DataFrame in pyspark

2015-07-25 Thread Felix Cheung (JIRA)
Felix Cheung created ZEPPELIN-185:
-

 Summary: z.show does not work on DataFrame in pyspark
 Key: ZEPPELIN-185
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-185
 Project: Zeppelin
  Issue Type: Bug
  Components: Core, Interpreters
Affects Versions: 0.6.0
Reporter: Felix Cheung
Assignee: Felix Cheung


I’ve tested this out and found these issues. Firstly,

http://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=createdataframe#pyspark.sql.SQLContext.createDataFrame
# Code should be changed to this – it does not work in pyspark CLI otherwise
rdd = sc.parallelize(["1","2","3"])
Data = Row('first')
df = sqlContext.createDataFrame(rdd.map(lambda d: Data(d)))

Secondly,
z.show() doesn’t seem to work properly in Python – I see the same error below: 
“AttributeError: 'DataFrame' object has no attribute '_get_object_id'"
#Python/PySpark – doesn’t work
rdd = sc.parallelize(["1","2","3"])
Data = Row('first')
df = sqlContext.createDataFrame(rdd.map(lambda d: Data(d)))
print df
print df.collect()
z.show(df)
AttributeError: 'DataFrame' object has no attribute ‘_get_object_id'

#Scala – this works
val a = sc.parallelize(List("1", "2", "3"))
val df = a.toDF()
z.show(df)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZEPPELIN-212) Interpreter should support returning a list of InterpreterResults

2015-08-09 Thread Felix Cheung (JIRA)
Felix Cheung created ZEPPELIN-212:
-

 Summary: Interpreter should support returning a list of 
InterpreterResults
 Key: ZEPPELIN-212
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-212
 Project: Zeppelin
  Issue Type: Improvement
Affects Versions: 0.6.0
Reporter: Felix Cheung


Discussed here:

https://github.com/apache/incubator-zeppelin/pull/164




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZEPPELIN-214) Notebook editor should support languages from extensible interpreters

2015-08-10 Thread Felix Cheung (JIRA)
Felix Cheung created ZEPPELIN-214:
-

 Summary: Notebook editor should support languages from extensible 
interpreters
 Key: ZEPPELIN-214
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-214
 Project: Zeppelin
  Issue Type: Improvement
  Components: GUI
Affects Versions: 0.6.0
Reporter: Felix Cheung


>From https://github.com/apache/incubator-zeppelin/pull/181

I think we need to make that extensible from the interpreters, that way it 
could support other modes like Python.

tzolov commented
@felixcheung, thanks for the suggestion. i've thought of ZEPPELIN-188 as well. 
Indeed the dirty page event is a good trigger but i am worried of the 
performance impact. Perhaps it is neglectable, also we can improve the set 
logic to SET-ONLY-IF-DIFFERENT.

But let us first merge the current fix. This will resolve ZEPPELIN-141 and fix 
the broken autocompletion. Then we can generalize the solution (the way you 
have suggested) in the context of ZEPPELIN-188. What do you think?

I agree about the extensible interpreters. IMHO this will require extending the 
Interpreter interface, may be extend the websocket protocol with an additional 
command and figuring out how to enable the ACE features in the front-end. For 
example now only few ACE modes are defined in the bower.json. If you need more 
(like SH or PYTHON for example) one have to add them to bower.json and 
re-build. Maybe we can (pre)enable large set of the ACE modes upfront. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZEPPELIN-283) IllegalArgumentException when running Zeppelin on provided Spark 1.5.0 snapshot build

2015-09-06 Thread Felix Cheung (JIRA)
Felix Cheung created ZEPPELIN-283:
-

 Summary: IllegalArgumentException when running Zeppelin on 
provided Spark 1.5.0 snapshot build
 Key: ZEPPELIN-283
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-283
 Project: Zeppelin
  Issue Type: Bug
  Components: Interpreters
Affects Versions: 0.6.0
Reporter: Felix Cheung
Assignee: Felix Cheung
Priority: Blocker


In the case of a snapshot build, SparkContext.version return a string 
"1.5.0-SNAPSHOT"

Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.5.0-SNAPSHOT
  /_/

scala> sc.version
res0: String = 1.5.0-SNAPSHOT

SparkVersion helper class is expecting the numeric parts only and thus fails to 
find a match.


java.lang.IllegalArgumentException
at 
org.apache.zeppelin.spark.SparkVersion.fromVersionString(SparkVersion.java:57)
at 
org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:442)
at 
org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:74)
at 
org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68)
at 
org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:92)
at 
org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:276)
at org.apache.zeppelin.scheduler.Job.run(Job.java:170)
at 
org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:118)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZEPPELIN-299) Support clearing output for paragraph

2015-09-10 Thread Felix Cheung (JIRA)
Felix Cheung created ZEPPELIN-299:
-

 Summary: Support clearing output for paragraph
 Key: ZEPPELIN-299
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-299
 Project: Zeppelin
  Issue Type: Bug
  Components: GUI
Reporter: Felix Cheung
Priority: Minor


This is helpful in situations like running a presentation or collaborating on a 
notebook. We should allow clearing output for an individual paragraph and 
having that saved.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZEPPELIN-310) Flink monitoring port conflict with Zeppelin web

2015-09-16 Thread Felix Cheung (JIRA)
Felix Cheung created ZEPPELIN-310:
-

 Summary: Flink monitoring port conflict with Zeppelin web
 Key: ZEPPELIN-310
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-310
 Project: Zeppelin
  Issue Type: Bug
Reporter: Felix Cheung
Priority: Minor


As per 
https://ci.apache.org/projects/flink/flink-docs-release-0.9/quickstart/run_example_quickstart.html

Monitoring is at
http://localhost:8080/launch.html

Which conflicts with Zeppelin's http://localhost:8080

Would be nice to config Flink to use an alternative port if possible.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZEPPELIN-328) Interpreter page should clarify the % magic syntax for interpreter group.name

2015-09-29 Thread Felix Cheung (JIRA)
Felix Cheung created ZEPPELIN-328:
-

 Summary: Interpreter page should clarify the % magic syntax for 
interpreter group.name
 Key: ZEPPELIN-328
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-328
 Project: Zeppelin
  Issue Type: Bug
  Components: GUI, Interpreters
Affects Versions: 0.6.0
Reporter: Felix Cheung
Assignee: Felix Cheung
Priority: Minor


Currently the Interpreter page like the interpreters as
hive %hql

However, this does not work unless hive is the default group - otherwise one 
would require the full %group.name.

It seems it would be better to list interpreter as %group.name on the page.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZEPPELIN-368) Document configurable properties for Spark interpreter

2015-10-27 Thread Felix Cheung (JIRA)
Felix Cheung created ZEPPELIN-368:
-

 Summary: Document configurable properties for Spark interpreter
 Key: ZEPPELIN-368
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-368
 Project: Zeppelin
  Issue Type: Bug
  Components: Interpreters
Affects Versions: 0.5.5
Reporter: Felix Cheung
Assignee: Felix Cheung
Priority: Minor


A few thoughts on improving doc:

1. add properties for Spark, PySpark, DepInterpreter and their descriptions
2. add examples on using ZeppelinContext in PySpark - eg. passing DataFrame
3. how to load Zeppelin with included or on-the-box Spark
4. configuring Spark configurations, loading Spark conf file, env variable etc.

Ideas welcome!




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZEPPELIN-378) Clarify uses of spark.home property vs SPARK_HOME env var

2015-10-30 Thread Felix Cheung (JIRA)
Felix Cheung created ZEPPELIN-378:
-

 Summary: Clarify uses of spark.home property vs SPARK_HOME env var
 Key: ZEPPELIN-378
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-378
 Project: Zeppelin
  Issue Type: Bug
  Components: Interpreters
Affects Versions: 0.6.0
Reporter: Felix Cheung
Priority: Minor


interpreter property 'spark.home' is little bit confusing with SPARK_HOME.
At the moment, defining SPARK_HOME in conf/zeppelin-env.sh is recommended 
instead of spark.home.

Best,
moon

On Fri, Oct 30, 2015 at 2:44 AM Jeff Steinmetz  
wrote:
That’s a good pointer.
Question still stands, how do you load libraries (jars) for %pyspark?

Its clear how to do it for %spark (scala) via %dep.

Looking for the equivalent of:

./bin/pyspark --master local[2] --jars jars/elasticsearch-hadoop-2.1.0.Beta2.jar


From: Matt Sochor
Reply-To: 
Date: Thursday, October 29, 2015 at 3:19 PM
To: 
Subject: Re: pyspark with jar

I actually *just* figured it out.  Zeppelin has sqlContext "already created and 
exposed" (https://zeppelin.incubator.apache.org/docs/interpreter/spark.html).

So when I do "sqlContext = SQLContext(sc)" I overwrite sqlContext.  Then 
Zeppelin cannot see this new sqlContext.

Anyway, anyone out there experiencing this problem, do NOT initialize 
sqlContext and it works fine.  

On Thu, Oct 29, 2015 at 6:10 PM Jeff Steinmetz  
wrote:
In zeppelin, what is the equivalent to adding jars in a pyspark call?

Such as running pyspark with the elasticsearch-hadoop jar

./bin/pyspark --master local[2] --jars jars/elasticsearch-hadoop-2.1.0.Beta2.jar

My assumption is that loading something like this inside a %dep is pointless, 
since those dependencies would only live in the %spark scala world (the spark 
jvm).  In zeppelin - pyspark spawns a separate process.

Also how is the interpreters “spark.home” used?  How is it different that the  
“SPARK_HOME” zeppelin-env.sh
And finally – how are args used in the interpreter?  (what uses them)?

Thank you.
Jeff



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZEPPELIN-379) Pyspark needs to warn user when sqlContext is overwritten

2015-10-30 Thread Felix Cheung (JIRA)
Felix Cheung created ZEPPELIN-379:
-

 Summary: Pyspark needs to warn user when sqlContext is overwritten
 Key: ZEPPELIN-379
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-379
 Project: Zeppelin
  Issue Type: Bug
  Components: Interpreters
Affects Versions: 0.6.0
Reporter: Felix Cheung
Priority: Minor


I can this can be a fairly big usability problem we need know how to warn users 
about it.

From: Matt Sochor
Reply-To: 
Date: Thursday, October 29, 2015 at 3:19 PM
To: 
Subject: Re: pyspark with jar

I actually *just* figured it out.  Zeppelin has sqlContext "already created and 
exposed" (https://zeppelin.incubator.apache.org/docs/interpreter/spark.html).

So when I do "sqlContext = SQLContext(sc)" I overwrite sqlContext.  Then 
Zeppelin cannot see this new sqlContext.

Anyway, anyone out there experiencing this problem, do NOT initialize 
sqlContext and it works fine.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZEPPELIN-395) Support Spark 1.6.0

2015-11-05 Thread Felix Cheung (JIRA)
Felix Cheung created ZEPPELIN-395:
-

 Summary: Support Spark 1.6.0
 Key: ZEPPELIN-395
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-395
 Project: Zeppelin
  Issue Type: Bug
Reporter: Felix Cheung
Assignee: Felix Cheung


There are several changes release to this coming release of Spark



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZEPPELIN-399) Show display Zeppelin build number on home page and add an API for programmatic checks

2015-11-06 Thread Felix Cheung (JIRA)
Felix Cheung created ZEPPELIN-399:
-

 Summary: Show display Zeppelin build number on home page and add 
an API for programmatic checks
 Key: ZEPPELIN-399
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-399
 Project: Zeppelin
  Issue Type: Bug
  Components: GUI
Reporter: Felix Cheung
Priority: Minor


It would be nice to be able to check which build of Zeppelin I'm running with



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZEPPELIN-425) Dynamic form: ZeppelinContext input() value should be available in get()

2015-11-15 Thread Felix Cheung (JIRA)
Felix Cheung created ZEPPELIN-425:
-

 Summary: Dynamic form: ZeppelinContext input() value should be 
available in get()
 Key: ZEPPELIN-425
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-425
 Project: Zeppelin
  Issue Type: Bug
Reporter: Felix Cheung
Priority: Minor


It would be nice to be able to do:

{code}
z.input("foo")
... // later
z.get("foo")
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZEPPELIN-428) Support Python programmatic access to dynamic form

2015-11-15 Thread Felix Cheung (JIRA)
Felix Cheung created ZEPPELIN-428:
-

 Summary: Support Python programmatic access to dynamic form
 Key: ZEPPELIN-428
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-428
 Project: Zeppelin
  Issue Type: Bug
  Components: Interpreters
Affects Versions: 0.5.5
Reporter: Felix Cheung
Assignee: Felix Cheung
Priority: Minor


This is great feedback from the local Zeppelin community.
People are going to Scala to programmatically access ZeppelinContext z.input() 
and z.select(), and then passing back to Python to use in their code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZEPPELIN-436) Broken link to Spark in Dynamic Form doc

2015-11-17 Thread Felix Cheung (JIRA)
Felix Cheung created ZEPPELIN-436:
-

 Summary: Broken link to Spark in Dynamic Form doc
 Key: ZEPPELIN-436
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-436
 Project: Zeppelin
  Issue Type: Bug
Reporter: Felix Cheung


https://zeppelin.incubator.apache.org/docs/manual/dynamicform.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZEPPELIN-461) Typos in geode doc

2015-11-25 Thread Felix Cheung (JIRA)
Felix Cheung created ZEPPELIN-461:
-

 Summary: Typos in geode doc
 Key: ZEPPELIN-461
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-461
 Project: Zeppelin
  Issue Type: Bug
  Components: Interpreters
Affects Versions: 0.5.6
Reporter: Felix Cheung
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZEPPELIN-466) Typos in PostgreSQL doc

2015-11-25 Thread Felix Cheung (JIRA)
Felix Cheung created ZEPPELIN-466:
-

 Summary: Typos in PostgreSQL doc
 Key: ZEPPELIN-466
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-466
 Project: Zeppelin
  Issue Type: Bug
  Components: Interpreters
Affects Versions: 0.5.5
Reporter: Felix Cheung
Assignee: Felix Cheung
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZEPPELIN-494) PySpark Interpreter should check SPARK_HOME since PySpark requires it is set in environment

2015-12-08 Thread Felix Cheung (JIRA)
Felix Cheung created ZEPPELIN-494:
-

 Summary: PySpark Interpreter should check SPARK_HOME since PySpark 
requires it is set in environment
 Key: ZEPPELIN-494
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-494
 Project: Zeppelin
  Issue Type: Bug
Reporter: Felix Cheung
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (ZEPPELIN-2058) Reduce test matrix on Travis

2017-02-04 Thread Felix Cheung (JIRA)
Felix Cheung created ZEPPELIN-2058:
--

 Summary: Reduce test matrix on Travis
 Key: ZEPPELIN-2058
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-2058
 Project: Zeppelin
  Issue Type: Bug
  Components: build
Reporter: Felix Cheung


We have 11 profile in the Travis matrix and tests are running for a long time. 
We should consider streamlining it:

- do we really support that many versions of Spark? how about just 1.6.x, 2.0.x 
and 2.1.x?
- could we merge the python 2 and 3 tests into other profiles?
- could we merge the Livy test into another profile?




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (ZEPPELIN-4026) Doc should warn about anonymous access

2019-03-02 Thread Felix Cheung (JIRA)
Felix Cheung created ZEPPELIN-4026:
--

 Summary: Doc should warn about anonymous access
 Key: ZEPPELIN-4026
 URL: https://issues.apache.org/jira/browse/ZEPPELIN-4026
 Project: Zeppelin
  Issue Type: Bug
Reporter: Felix Cheung






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (GEARPUMP-56) Add gearpump interpreter for apache zeppelin

2016-04-21 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/GEARPUMP-56?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15253125#comment-15253125
 ] 

Felix Cheung commented on GEARPUMP-56:
--

Hi - I'm a committer of Zeppelin - would love to help!

> Add gearpump interpreter for apache zeppelin
> 
>
> Key: GEARPUMP-56
> URL: https://issues.apache.org/jira/browse/GEARPUMP-56
> Project: Apache Gearpump
>  Issue Type: New Feature
>  Components: interactive
>Affects Versions: 0.8.0
>Reporter: Kam Kasravi
> Fix For: 0.8.1
>
>
> Similar to what flink has done 
> https://zeppelin.incubator.apache.org/docs/0.5.5-incubating/interpreter/flink.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (GEARPUMP-56) Add gearpump interpreter for apache zeppelin

2016-04-25 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/GEARPUMP-56?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15257243#comment-15257243
 ] 

Felix Cheung commented on GEARPUMP-56:
--

ZEPPELIN-827
https://issues.apache.org/jira/browse/ZEPPELIN-827

> Add gearpump interpreter for apache zeppelin
> 
>
> Key: GEARPUMP-56
> URL: https://issues.apache.org/jira/browse/GEARPUMP-56
> Project: Apache Gearpump
>  Issue Type: New Feature
>  Components: interactive
>Affects Versions: 0.8.0
>Reporter: Kam Kasravi
>Assignee: Felix Cheung
> Fix For: 0.8.1
>
>
> Similar to what flink has done 
> https://zeppelin.incubator.apache.org/docs/0.5.5-incubating/interpreter/flink.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (WHIMSY-337) roster - LDAP sync issue

2020-07-24 Thread Felix Cheung (Jira)
Felix Cheung created WHIMSY-337:
---

 Summary: roster - LDAP sync issue
 Key: WHIMSY-337
 URL: https://issues.apache.org/jira/browse/WHIMSY-337
 Project: Whimsy
  Issue Type: Bug
Reporter: Felix Cheung


I made a roster change,   I saw the email days ago, but LDAP is not changed

 

[https://lists.apache.org/thread.html/rb452a0d3d67eb04071e637ee3a9cc98050aaea9af427ab716b4e2c9e%40%3Cprivate.incubator.apache.org%3E]

 

[https://whimsy.apache.org/roster/ppmc/superset]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (WHIMSY-337) roster - LDAP sync issue

2020-07-26 Thread Felix Cheung (Jira)


 [ 
https://issues.apache.org/jira/browse/WHIMSY-337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung closed WHIMSY-337.
---
Resolution: Fixed

> roster - LDAP sync issue
> 
>
> Key: WHIMSY-337
> URL: https://issues.apache.org/jira/browse/WHIMSY-337
> Project: Whimsy
>  Issue Type: Bug
>Reporter: Felix Cheung
>Priority: Major
>
> I made a roster change,   I saw the email days ago, but LDAP is not changed
>  
> [https://lists.apache.org/thread.html/rb452a0d3d67eb04071e637ee3a9cc98050aaea9af427ab716b4e2c9e%40%3Cprivate.incubator.apache.org%3E]
>  
> [https://whimsy.apache.org/roster/ppmc/superset]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (SPARK-16090) Improve method grouping in SparkR generated docs

2016-06-21 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15342474#comment-15342474
 ] 

Felix Cheung commented on SPARK-16090:
--

sounds like we only need to review the doc for gapply - which doesn't have a 
task yet. [~Narine] - if you would pull up the generated html it looks kinda 
confusing with same parameter name and different descriptions.

> Improve method grouping in SparkR generated docs
> 
>
> Key: SPARK-16090
> URL: https://issues.apache.org/jira/browse/SPARK-16090
> Project: Spark
>  Issue Type: Umbrella
>  Components: Documentation, SparkR
>Affects Versions: 2.0.0
>Reporter: Xiangrui Meng
>Priority: Critical
>
> This JIRA follows the discussion on 
> https://github.com/apache/spark/pull/13109 to improve method grouping in 
> SparkR generated docs. Having one method per doc page is not an R convention. 
> However, having many methods per doc page would hurt the readability. So a 
> proper grouping would help. Since we use roxygen2 instead of writing Rd files 
> directly, we should consider smaller groups to avoid confusion. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16090) Improve method grouping in SparkR generated docs

2016-06-21 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15342696#comment-15342696
 ] 

Felix Cheung commented on SPARK-16090:
--

This is for example the html output for gapply

{code}
# S4 method for signature 'GroupedData'
gapply(x, func, schema)

## S4 method for signature 'SparkDataFrame'
gapply(x, cols, func, schema)



Arguments


x

a GroupedData

func

A function to be applied to each group partition specified by GroupedData.
The function 'func' takes as argument a key - grouping columns and
a data frame - a local R data.frame.
The output of 'func' is a local R data.frame.

schema

The schema of the resulting SparkDataFrame after the function is applied.
The schema must match to output of 'func'. It has to be defined for each
output column with preferred output column name and corresponding data type.

cols

Grouping columns

x

A SparkDataFrame

func

A function to be applied to each group partition specified by grouping
column of the SparkDataFrame. The function 'func' takes as argument
a key - grouping columns and a data frame - a local R data.frame.
The output of 'func' is a local R data.frame.

schema

The schema of the resulting SparkDataFrame after the function is applied.
The schema must match to output of 'func'. It has to be defined for each
output column with preferred output column name and corresponding data type.


{code}

As you can see, func and schema are listed twice with different wording under 
Arguments.
We should see if we could explain it one way and list them once only. (ie. one 
copy of "@param func")


> Improve method grouping in SparkR generated docs
> 
>
> Key: SPARK-16090
> URL: https://issues.apache.org/jira/browse/SPARK-16090
> Project: Spark
>  Issue Type: Umbrella
>  Components: Documentation, SparkR
>Affects Versions: 2.0.0
>Reporter: Xiangrui Meng
>Priority: Critical
>
> This JIRA follows the discussion on 
> https://github.com/apache/spark/pull/13109 to improve method grouping in 
> SparkR generated docs. Having one method per doc page is not an R convention. 
> However, having many methods per doc page would hurt the readability. So a 
> proper grouping would help. Since we use roxygen2 instead of writing Rd files 
> directly, we should consider smaller groups to avoid confusion. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-16090) Improve method grouping in SparkR generated docs

2016-06-21 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15342696#comment-15342696
 ] 

Felix Cheung edited comment on SPARK-16090 at 6/21/16 9:08 PM:
---

This is for example the html output for gapply

{code}
# S4 method for signature 'GroupedData'
gapply(x, func, schema)

## S4 method for signature 'SparkDataFrame'
gapply(x, cols, func, schema)



Arguments


x

a GroupedData

func

A function to be applied to each group partition specified by GroupedData.
The function 'func' takes as argument a key - grouping columns and
a data frame - a local R data.frame.
The output of 'func' is a local R data.frame.

schema

The schema of the resulting SparkDataFrame after the function is applied.
The schema must match to output of 'func'. It has to be defined for each
output column with preferred output column name and corresponding data type.

cols

Grouping columns

x

A SparkDataFrame

func

A function to be applied to each group partition specified by grouping
column of the SparkDataFrame. The function 'func' takes as argument
a key - grouping columns and a data frame - a local R data.frame.
The output of 'func' is a local R data.frame.

schema

The schema of the resulting SparkDataFrame after the function is applied.
The schema must match to output of 'func'. It has to be defined for each
output column with preferred output column name and corresponding data type.


{code}

As you can see, func and schema (and x) are listed twice with different wording 
under Arguments.
We should see if we could explain it one way and list them once only. (ie. one 
copy of "@param func")



was (Author: felixcheung):
This is for example the html output for gapply

{code}
# S4 method for signature 'GroupedData'
gapply(x, func, schema)

## S4 method for signature 'SparkDataFrame'
gapply(x, cols, func, schema)



Arguments


x

a GroupedData

func

A function to be applied to each group partition specified by GroupedData.
The function 'func' takes as argument a key - grouping columns and
a data frame - a local R data.frame.
The output of 'func' is a local R data.frame.

schema

The schema of the resulting SparkDataFrame after the function is applied.
The schema must match to output of 'func'. It has to be defined for each
output column with preferred output column name and corresponding data type.

cols

Grouping columns

x

A SparkDataFrame

func

A function to be applied to each group partition specified by grouping
column of the SparkDataFrame. The function 'func' takes as argument
a key - grouping columns and a data frame - a local R data.frame.
The output of 'func' is a local R data.frame.

schema

The schema of the resulting SparkDataFrame after the function is applied.
The schema must match to output of 'func'. It has to be defined for each
output column with preferred output column name and corresponding data type.


{code}

As you can see, func and schema are listed twice with different wording under 
Arguments.
We should see if we could explain it one way and list them once only. (ie. one 
copy of "@param func")


> Improve method grouping in SparkR generated docs
> 
>
> Key: SPARK-16090
> URL: https://issues.apache.org/jira/browse/SPARK-16090
> Project: Spark
>  Issue Type: Umbrella
>  Components: Documentation, SparkR
>Affects Versions: 2.0.0
>Reporter: Xiangrui Meng
>Priority: Critical
>
> This JIRA follows the discussion on 
> https://github.com/apache/spark/pull/13109 to improve method grouping in 
> SparkR generated docs. Having one method per doc page is not an R convention. 
> However, having many methods per doc page would hurt the readability. So a 
> proper grouping would help. Since we use roxygen2 instead of writing Rd files 
> directly, we should consider smaller groups to avoid confusion. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16088) Deprecate setJobGroup, clearJobGroup, cancelJobGroup from SparkR API

2016-06-21 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15343807#comment-15343807
 ] 

Felix Cheung commented on SPARK-16088:
--

Right, since they are S3 methods there really isn't any function overloading 
here. A big part of the work is going to be inspecting the class of the first 
parameter and then patch up the call. Also tricking roxygen2 to generate doc 
for the right parameter list.

I'm testing the fix now.


> Deprecate setJobGroup, clearJobGroup, cancelJobGroup from SparkR API
> 
>
> Key: SPARK-16088
> URL: https://issues.apache.org/jira/browse/SPARK-16088
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.6.1
>Reporter: Felix Cheung
>
> Since they uses SparkContext?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16088) Update setJobGroup, clearJobGroup, cancelJobGroup SparkR API to not require sc

2016-06-21 Thread Felix Cheung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-16088:
-
Summary: Update setJobGroup, clearJobGroup, cancelJobGroup SparkR API to 
not require sc  (was: Deprecate setJobGroup, clearJobGroup, cancelJobGroup from 
SparkR API)

> Update setJobGroup, clearJobGroup, cancelJobGroup SparkR API to not require sc
> --
>
> Key: SPARK-16088
> URL: https://issues.apache.org/jira/browse/SPARK-16088
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.6.1
>Reporter: Felix Cheung
>
> Since they uses SparkContext?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15124) R 2.0 QA: New R APIs and API docs

2016-06-21 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15343810#comment-15343810
 ] 

Felix Cheung commented on SPARK-15124:
--

I think  both of these are updated now.

> R 2.0 QA: New R APIs and API docs
> -
>
> Key: SPARK-15124
> URL: https://issues.apache.org/jira/browse/SPARK-15124
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, SparkR
>Reporter: Joseph K. Bradley
>Priority: Blocker
>
> Audit new public R APIs.  Take note of:
> * Correctness and uniformity of API
> * Documentation: Missing?  Bad links or formatting?
> ** Check both the generated docs linked from the user guide and the R command 
> line docs `?read.df`. These are generated using roxygen.
> As you find issues, please create JIRAs and link them to this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16184) Support SparkSession.conf API in SparkR

2016-06-23 Thread Felix Cheung (JIRA)
Felix Cheung created SPARK-16184:


 Summary: Support SparkSession.conf API in SparkR
 Key: SPARK-16184
 URL: https://issues.apache.org/jira/browse/SPARK-16184
 Project: Spark
  Issue Type: Bug
  Components: SparkR
Affects Versions: 2.0.0
Reporter: Felix Cheung






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-16310) SparkR csv source should have the same default na.string as R

2016-06-29 Thread Felix Cheung (JIRA)
Felix Cheung created SPARK-16310:


 Summary: SparkR csv source should have the same default na.string 
as R
 Key: SPARK-16310
 URL: https://issues.apache.org/jira/browse/SPARK-16310
 Project: Spark
  Issue Type: Bug
  Components: SparkR
Affects Versions: 1.6.2
Reporter: Felix Cheung
Priority: Minor


https://stat.ethz.ch/R-manual/R-devel/library/utils/html/read.table.html
na.strings = "NA"




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16308) SparkR csv source should have the same default na.string as R

2016-06-29 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15355989#comment-15355989
 ] 

Felix Cheung commented on SPARK-16308:
--

dup of https://issues.apache.org/jira/browse/SPARK-16310


> SparkR csv source should have the same default na.string as R
> -
>
> Key: SPARK-16308
> URL: https://issues.apache.org/jira/browse/SPARK-16308
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.6.2
>Reporter: Felix Cheung
>Priority: Minor
>
> https://stat.ethz.ch/R-manual/R-devel/library/utils/html/read.table.html
> na.strings = "NA"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16309) SparkR csv source should have the same default na.string as R

2016-06-29 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15355988#comment-15355988
 ] 

Felix Cheung commented on SPARK-16309:
--

dup of https://issues.apache.org/jira/browse/SPARK-16310


> SparkR csv source should have the same default na.string as R
> -
>
> Key: SPARK-16309
> URL: https://issues.apache.org/jira/browse/SPARK-16309
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.6.2
>Reporter: Felix Cheung
>Priority: Minor
>
> https://stat.ethz.ch/R-manual/R-devel/library/utils/html/read.table.html
> na.strings = "NA"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-16308) SparkR csv source should have the same default na.string as R

2016-06-29 Thread Felix Cheung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung resolved SPARK-16308.
--
Resolution: Duplicate

> SparkR csv source should have the same default na.string as R
> -
>
> Key: SPARK-16308
> URL: https://issues.apache.org/jira/browse/SPARK-16308
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.6.2
>Reporter: Felix Cheung
>Priority: Minor
>
> https://stat.ethz.ch/R-manual/R-devel/library/utils/html/read.table.html
> na.strings = "NA"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-16308) SparkR csv source should have the same default na.string as R

2016-06-29 Thread Felix Cheung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung closed SPARK-16308.


> SparkR csv source should have the same default na.string as R
> -
>
> Key: SPARK-16308
> URL: https://issues.apache.org/jira/browse/SPARK-16308
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.6.2
>Reporter: Felix Cheung
>Priority: Minor
>
> https://stat.ethz.ch/R-manual/R-devel/library/utils/html/read.table.html
> na.strings = "NA"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16309) SparkR csv source should have the same default na.string as R

2016-06-29 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15355991#comment-15355991
 ] 

Felix Cheung commented on SPARK-16309:
--

could someone please close this bug (I can't)?

> SparkR csv source should have the same default na.string as R
> -
>
> Key: SPARK-16309
> URL: https://issues.apache.org/jira/browse/SPARK-16309
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.6.2
>Reporter: Felix Cheung
>Priority: Minor
>
> https://stat.ethz.ch/R-manual/R-devel/library/utils/html/read.table.html
> na.strings = "NA"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16308) SparkR csv source should have the same default na.string as R

2016-06-29 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15355992#comment-15355992
 ] 

Felix Cheung commented on SPARK-16308:
--

could someone please close this bug (I can't)?

> SparkR csv source should have the same default na.string as R
> -
>
> Key: SPARK-16308
> URL: https://issues.apache.org/jira/browse/SPARK-16308
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.6.2
>Reporter: Felix Cheung
>Priority: Minor
>
> https://stat.ethz.ch/R-manual/R-devel/library/utils/html/read.table.html
> na.strings = "NA"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16144) Add a separate Rd for ML generic methods: read.ml, write.ml, summary, predict

2016-06-30 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15356746#comment-15356746
 ] 

Felix Cheung commented on SPARK-16144:
--

Sorry, started didn't realize there's a JIRA. There's already a separate page 
for read.ml, and summary. Though summary has content for 
summary(SparkDataFrame) so not clear how we want to put summary(model) - we 
could add a page for summary.ml?

> Add a separate Rd for ML generic methods: read.ml, write.ml, summary, predict
> -
>
> Key: SPARK-16144
> URL: https://issues.apache.org/jira/browse/SPARK-16144
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, MLlib, SparkR
>Affects Versions: 2.0.0
>Reporter: Xiangrui Meng
>Assignee: Yanbo Liang
>
> After we grouped generic methods by the algorithm, it would be nice to add a 
> separate Rd for each ML generic methods, in particular, write.ml, read.ml, 
> summary, and predict and link the implementations with seealso.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16326) Evaluate sparklyr package from RStudio

2016-07-01 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15358582#comment-15358582
 ] 

Felix Cheung commented on SPARK-16326:
--

Interesting. I'm very surprised by the amount of effort going into this package.

> Evaluate sparklyr package from RStudio
> --
>
> Key: SPARK-16326
> URL: https://issues.apache.org/jira/browse/SPARK-16326
> Project: Spark
>  Issue Type: Brainstorming
>  Components: SparkR
>Reporter: Sun Rui
>
> Rstudio has developed sparklyr (https://github.com/rstudio/sparklyr) 
> connecting R community to Spark. A rough review shows that sparklyr provides 
> a dplyr backend and new API for mLLIB and for calling Spark from R. Of 
> course, sparklyr internally uses the low level mechanism in SparkR.
> We can discuss how to position SparkR with sparklyr.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16144) Add a separate Rd for ML generic methods: read.ml, write.ml, summary, predict

2016-07-01 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15358611#comment-15358611
 ] 

Felix Cheung commented on SPARK-16144:
--

I think names like predict and summary are familiar to R users.
In term of usage I don't think there is a conflict with summary(SparkDataFrame) 
- for documentation we could easily have a page under summary.ml and cross link 
with summary(SparkDataFrame) for discoverability.

It might affect what in-line help user get when they do ?summary though. 

> Add a separate Rd for ML generic methods: read.ml, write.ml, summary, predict
> -
>
> Key: SPARK-16144
> URL: https://issues.apache.org/jira/browse/SPARK-16144
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, MLlib, SparkR
>Affects Versions: 2.0.0
>Reporter: Xiangrui Meng
>Assignee: Yanbo Liang
>
> After we grouped generic methods by the algorithm, it would be nice to add a 
> separate Rd for each ML generic methods, in particular, write.ml, read.ml, 
> summary, and predict and link the implementations with seealso.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16144) Add a separate Rd for ML generic methods: read.ml, write.ml, summary, predict

2016-07-02 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15360170#comment-15360170
 ] 

Felix Cheung commented on SPARK-16144:
--

It should be possible to have a separate doc page but keep the function name:
https://stat.ethz.ch/R-manual/R-devel/library/stats/html/summary.lm.html


> Add a separate Rd for ML generic methods: read.ml, write.ml, summary, predict
> -
>
> Key: SPARK-16144
> URL: https://issues.apache.org/jira/browse/SPARK-16144
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, MLlib, SparkR
>Affects Versions: 2.0.0
>Reporter: Xiangrui Meng
>Assignee: Yanbo Liang
>
> After we grouped generic methods by the algorithm, it would be nice to add a 
> separate Rd for each ML generic methods, in particular, write.ml, read.ml, 
> summary, and predict and link the implementations with seealso.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16509) Rename window.partitionBy and window.orderBy

2016-07-13 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375994#comment-15375994
 ] 

Felix Cheung commented on SPARK-16509:
--

These were added in 2.0.0

> Rename window.partitionBy and window.orderBy
> 
>
> Key: SPARK-16509
> URL: https://issues.apache.org/jira/browse/SPARK-16509
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Shivaram Venkataraman
>
> Right now R CMD check [1]  interprets window.partitonBy and window.orderBy as 
> S3 functions defined on the "partitionBy" class or "orderBy" class (similar 
> to say summary.lm) 
> To avoid confusion I think we should just rename the functions and not use 
> `.` in them ?
> cc [~sunrui]
> [1] https://gist.github.com/shivaram/62866c4ca59c5d34b8963939cf04b5eb



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14816) Update MLlib, GraphX, SparkR websites for 2.0

2016-07-13 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375996#comment-15375996
 ] 

Felix Cheung commented on SPARK-14816:
--

This was added in the SparkR programming guide.

> Update MLlib, GraphX, SparkR websites for 2.0
> -
>
> Key: SPARK-14816
> URL: https://issues.apache.org/jira/browse/SPARK-14816
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, GraphX, ML, MLlib, SparkR
>Reporter: Joseph K. Bradley
>Priority: Critical
>
> Update the sub-projects' websites to include new features in this release.
> For MLlib, make it clear that the DataFrame-based API is the primary one now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16510) Move SparkR test JAR into Spark, include its source code

2016-07-13 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376000#comment-15376000
 ] 

Felix Cheung commented on SPARK-16510:
--

I suspect (a) is problematic though? Could we assume javac is there to build 
the jar from source?
maybe skip test or download jar is easier?


> Move SparkR test JAR into Spark, include its source code
> 
>
> Key: SPARK-16510
> URL: https://issues.apache.org/jira/browse/SPARK-16510
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Shivaram Venkataraman
>
> One of the `NOTE`s in the R CMD check is that we currently include a test JAR 
> file in SparkR which is a binary only artifact. I think we can take two steps 
> to address this
> (a) I think we should include the source code for this in say core/src/test/ 
> or something like that. As far as I know the JAR file just needs to have a 
> single method. 
> (b) We should move the JAR file out of the SparkR test support and into some 
> other location in Spark. The trouble is that its tricky to run the test with 
> CRAN mode then. We could either disable the test for CRAN or download the JAR 
> from an external URL



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16519) Handle SparkR RDD generics that create warnings in R CMD check

2016-07-13 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376004#comment-15376004
 ] 

Felix Cheung commented on SPARK-16519:
--

+1 on removing RDD APIs...

> Handle SparkR RDD generics that create warnings in R CMD check
> --
>
> Key: SPARK-16519
> URL: https://issues.apache.org/jira/browse/SPARK-16519
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Shivaram Venkataraman
>
> One of the warnings we get from R CMD check is that RDD implementations of 
> some of the generics are not documented. These generics are shared between 
> RDD, DataFrames in SparkR. The list includes
> {quote}
> WARNING
> Undocumented S4 methods:
>   generic 'cache' and siglist 'RDD'
>   generic 'collect' and siglist 'RDD'
>   generic 'count' and siglist 'RDD'
>   generic 'distinct' and siglist 'RDD'
>   generic 'first' and siglist 'RDD'
>   generic 'join' and siglist 'RDD,RDD'
>   generic 'length' and siglist 'RDD'
>   generic 'partitionBy' and siglist 'RDD'
>   generic 'persist' and siglist 'RDD,character'
>   generic 'repartition' and siglist 'RDD'
>   generic 'show' and siglist 'RDD'
>   generic 'take' and siglist 'RDD,numeric'
>   generic 'unpersist' and siglist 'RDD'
> {quote}
> As described in 
> https://stat.ethz.ch/pipermail/r-devel/2003-September/027490.html this looks 
> like a limitation of R where exporting a generic from a package also exports 
> all the implementations of that generic. 
> One way to get around this is to remove the RDD API or rename the methods in 
> Spark 2.1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16309) SparkR csv source should have the same default na.string as R

2016-07-13 Thread Felix Cheung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-16309:
-
Affects Version/s: (was: 1.6.2)

> SparkR csv source should have the same default na.string as R
> -
>
> Key: SPARK-16309
> URL: https://issues.apache.org/jira/browse/SPARK-16309
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Reporter: Felix Cheung
>Priority: Minor
>
> https://stat.ethz.ch/R-manual/R-devel/library/utils/html/read.table.html
> na.strings = "NA"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16510) Move SparkR test JAR into Spark, include its source code

2016-07-13 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376273#comment-15376273
 ] 

Felix Cheung commented on SPARK-16510:
--

I see. I think Jenkins is optimizing out what to build so it might not build 
the jar? I agree upload jar might be ideal.

> Move SparkR test JAR into Spark, include its source code
> 
>
> Key: SPARK-16510
> URL: https://issues.apache.org/jira/browse/SPARK-16510
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Shivaram Venkataraman
>
> One of the `NOTE`s in the R CMD check is that we currently include a test JAR 
> file in SparkR which is a binary only artifact. I think we can take two steps 
> to address this
> (a) I think we should include the source code for this in say core/src/test/ 
> or something like that. As far as I know the JAR file just needs to have a 
> single method. 
> (b) We should move the JAR file out of the SparkR test support and into some 
> other location in Spark. The trouble is that its tricky to run the test with 
> CRAN mode then. We could either disable the test for CRAN or download the JAR 
> from an external URL



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15124) R 2.0 QA: New R APIs and API docs

2016-07-13 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376274#comment-15376274
 ] 

Felix Cheung commented on SPARK-15124:
--

I think the main ones are covered, in this and a bunch of other PRs.


> R 2.0 QA: New R APIs and API docs
> -
>
> Key: SPARK-15124
> URL: https://issues.apache.org/jira/browse/SPARK-15124
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, SparkR
>Reporter: Joseph K. Bradley
>Assignee: Yanbo Liang
>Priority: Blocker
> Fix For: 2.0.0
>
>
> Audit new public R APIs.  Take note of:
> * Correctness and uniformity of API
> * Documentation: Missing?  Bad links or formatting?
> ** Check both the generated docs linked from the user guide and the R command 
> line docs `?read.df`. These are generated using roxygen.
> As you find issues, please create JIRAs and link them to this issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16538) Cannot use "SparkR::sql"

2016-07-13 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376275#comment-15376275
 ] 

Felix Cheung commented on SPARK-16538:
--

Investigating...

> Cannot use "SparkR::sql"
> 
>
> Key: SPARK-16538
> URL: https://issues.apache.org/jira/browse/SPARK-16538
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.6.2, 2.0.0
>Reporter: Weiluo Ren
>Priority: Critical
>
> When call "SparkR::sql", an error pops up. For instance
> {code}
> SparkR::sql("")
> Error in get(paste0(funcName, ".default")) :
>  object '::.default' not found
> {code}
> https://github.com/apache/spark/blob/f4767bcc7a9d1bdd301f054776aa45e7c9f344a7/R/pkg/R/SQLContext.R#L51



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-16510) Move SparkR test JAR into Spark, include its source code

2016-07-13 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376273#comment-15376273
 ] 

Felix Cheung edited comment on SPARK-16510 at 7/14/16 3:55 AM:
---

I see. I think Jenkins is optimizing out what to build so it might not build 
the jar before building R code and running R tests? I agree upload jar might be 
ideal.


was (Author: felixcheung):
I see. I think Jenkins is optimizing out what to build so it might not build 
the jar? I agree upload jar might be ideal.

> Move SparkR test JAR into Spark, include its source code
> 
>
> Key: SPARK-16510
> URL: https://issues.apache.org/jira/browse/SPARK-16510
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Shivaram Venkataraman
>
> One of the `NOTE`s in the R CMD check is that we currently include a test JAR 
> file in SparkR which is a binary only artifact. I think we can take two steps 
> to address this
> (a) I think we should include the source code for this in say core/src/test/ 
> or something like that. As far as I know the JAR file just needs to have a 
> single method. 
> (b) We should move the JAR file out of the SparkR test support and into some 
> other location in Spark. The trouble is that its tricky to run the test with 
> CRAN mode then. We could either disable the test for CRAN or download the JAR 
> from an external URL



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16538) Cannot use "SparkR::sql"

2016-07-13 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376313#comment-15376313
 ] 

Felix Cheung commented on SPARK-16538:
--

Thanks for reporting this!

> Cannot use "SparkR::sql"
> 
>
> Key: SPARK-16538
> URL: https://issues.apache.org/jira/browse/SPARK-16538
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.6.2, 2.0.0
>Reporter: Weiluo Ren
>Priority: Critical
>
> When call "SparkR::sql", an error pops up. For instance
> {code}
> SparkR::sql("")
> Error in get(paste0(funcName, ".default")) :
>  object '::.default' not found
> {code}
> https://github.com/apache/spark/blob/f4767bcc7a9d1bdd301f054776aa45e7c9f344a7/R/pkg/R/SQLContext.R#L51



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16538) Cannot use "SparkR::sql"

2016-07-14 Thread Felix Cheung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-16538:
-
Fix Version/s: (was: 1.6.3)

> Cannot use "SparkR::sql"
> 
>
> Key: SPARK-16538
> URL: https://issues.apache.org/jira/browse/SPARK-16538
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.6.2, 2.0.0
>Reporter: Weiluo Ren
>Assignee: Felix Cheung
>Priority: Critical
> Fix For: 2.0.0
>
>
> When call "SparkR::sql", an error pops up. For instance
> {code}
> SparkR::sql("")
> Error in get(paste0(funcName, ".default")) :
>  object '::.default' not found
> {code}
> https://github.com/apache/spark/blob/f4767bcc7a9d1bdd301f054776aa45e7c9f344a7/R/pkg/R/SQLContext.R#L51



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16538) Cannot use "SparkR::sql"

2016-07-14 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15378054#comment-15378054
 ] 

Felix Cheung commented on SPARK-16538:
--

Remove Fix 1.6.3 since SPARK-10903 was not in Branch-1.6
https://github.com/apache/spark/blob/branch-1.6/R/pkg/R/SQLContext.R

> Cannot use "SparkR::sql"
> 
>
> Key: SPARK-16538
> URL: https://issues.apache.org/jira/browse/SPARK-16538
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.6.2, 2.0.0
>Reporter: Weiluo Ren
>Assignee: Felix Cheung
>Priority: Critical
> Fix For: 2.0.0
>
>
> When call "SparkR::sql", an error pops up. For instance
> {code}
> SparkR::sql("")
> Error in get(paste0(funcName, ".default")) :
>  object '::.default' not found
> {code}
> https://github.com/apache/spark/blob/f4767bcc7a9d1bdd301f054776aa45e7c9f344a7/R/pkg/R/SQLContext.R#L51



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-16538) Cannot use "SparkR::sql"

2016-07-14 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15378054#comment-15378054
 ] 

Felix Cheung edited comment on SPARK-16538 at 7/14/16 6:23 PM:
---

Remove Fix Version 1.6.3 since SPARK-10903 was not in Branch-1.6
https://github.com/apache/spark/blob/branch-1.6/R/pkg/R/SQLContext.R

This should not be merged to Branch-1.6


was (Author: felixcheung):
Remove Fix 1.6.3 since SPARK-10903 was not in Branch-1.6
https://github.com/apache/spark/blob/branch-1.6/R/pkg/R/SQLContext.R

> Cannot use "SparkR::sql"
> 
>
> Key: SPARK-16538
> URL: https://issues.apache.org/jira/browse/SPARK-16538
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 1.6.2, 2.0.0
>Reporter: Weiluo Ren
>Assignee: Felix Cheung
>Priority: Critical
> Fix For: 2.0.0
>
>
> When call "SparkR::sql", an error pops up. For instance
> {code}
> SparkR::sql("")
> Error in get(paste0(funcName, ".default")) :
>  object '::.default' not found
> {code}
> https://github.com/apache/spark/blob/f4767bcc7a9d1bdd301f054776aa45e7c9f344a7/R/pkg/R/SQLContext.R#L51



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16538) Cannot use "SparkR::sql"

2016-07-14 Thread Felix Cheung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-16538:
-
Affects Version/s: (was: 1.6.2)

> Cannot use "SparkR::sql"
> 
>
> Key: SPARK-16538
> URL: https://issues.apache.org/jira/browse/SPARK-16538
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.0.0
>Reporter: Weiluo Ren
>Assignee: Felix Cheung
>Priority: Critical
> Fix For: 2.0.0
>
>
> When call "SparkR::sql", an error pops up. For instance
> {code}
> SparkR::sql("")
> Error in get(paste0(funcName, ".default")) :
>  object '::.default' not found
> {code}
> https://github.com/apache/spark/blob/f4767bcc7a9d1bdd301f054776aa45e7c9f344a7/R/pkg/R/SQLContext.R#L51



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15799) Release SparkR on CRAN

2016-07-15 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380246#comment-15380246
 ] 

Felix Cheung commented on SPARK-15799:
--

re: higher level R packages that depends on SparkR
- I think we need to expand on what might be required here. We should support a 
higher level R package running in a cluster with existing Spark & SparkR; or 
without existing Spark?


> Release SparkR on CRAN
> --
>
> Key: SPARK-15799
> URL: https://issues.apache.org/jira/browse/SPARK-15799
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Reporter: Xiangrui Meng
>
> Story: "As an R user, I would like to see SparkR released on CRAN, so I can 
> use SparkR easily in an existing R environment and have other packages built 
> on top of SparkR."
> I made this JIRA with the following questions in mind:
> * Are there known issues that prevent us releasing SparkR on CRAN?
> * Do we want to package Spark jars in the SparkR release?
> * Are there license issues?
> * How does it fit into Spark's release process?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16508) Fix documentation warnings found by R CMD check

2016-07-15 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380436#comment-15380436
 ] 

Felix Cheung commented on SPARK-16508:
--

Sure but I thought your PR is already handling most of this, and there are 
other PR for RDD methods and so on?





On Fri, Jul 15, 2016 at 2:26 PM -0700, "Shivaram Venkataraman (JIRA)" 
mailto:j...@apache.org>> wrote:


[ 
https://issues.apache.org/jira/browse/SPARK-16508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380142#comment-15380142
 ]

Shivaram Venkataraman commented on SPARK-16508:
---

[~dongjoon] [~felixcheung] Would one of you be interested in contributing to 
this ? We could also split this into smaller PRs as there are quite a few 
warnings.

Also this is not required for 2.0, so if there are higher priority PRs we can 
get to this after 2.0 is released.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


> Fix documentation warnings found by R CMD check
> ---
>
> Key: SPARK-16508
> URL: https://issues.apache.org/jira/browse/SPARK-16508
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Shivaram Venkataraman
>
> A full list of warnings after the fixes in SPARK-16507 is at 
> https://gist.github.com/shivaram/62866c4ca59c5d34b8963939cf04b5eb 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-16508) Fix documentation warnings found by R CMD check

2016-07-15 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380436#comment-15380436
 ] 

Felix Cheung edited comment on SPARK-16508 at 7/16/16 3:05 AM:
---

Sure but I thought your PR is already handling most of this, and there are 
other PR for RDD methods and so on?



was (Author: felixcheung):
Sure but I thought your PR is already handling most of this, and there are 
other PR for RDD methods and so on?





On Fri, Jul 15, 2016 at 2:26 PM -0700, "Shivaram Venkataraman (JIRA)" 
mailto:j...@apache.org>> wrote:


[ 
https://issues.apache.org/jira/browse/SPARK-16508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380142#comment-15380142
 ]

Shivaram Venkataraman commented on SPARK-16508:
---

[~dongjoon] [~felixcheung] Would one of you be interested in contributing to 
this ? We could also split this into smaller PRs as there are quite a few 
warnings.

Also this is not required for 2.0, so if there are higher priority PRs we can 
get to this after 2.0 is released.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


> Fix documentation warnings found by R CMD check
> ---
>
> Key: SPARK-16508
> URL: https://issues.apache.org/jira/browse/SPARK-16508
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Shivaram Venkataraman
>
> A full list of warnings after the fixes in SPARK-16507 is at 
> https://gist.github.com/shivaram/62866c4ca59c5d34b8963939cf04b5eb 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16579) Add a spark install function

2016-07-15 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380453#comment-15380453
 ] 

Felix Cheung commented on SPARK-16579:
--

we should download from an official apache release mirror.
for snapshot builds that will not be available though.

> Add a spark install function
> 
>
> Key: SPARK-16579
> URL: https://issues.apache.org/jira/browse/SPARK-16579
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Shivaram Venkataraman
>
> As described in the design doc we need to introduce a function to install 
> Spark in case the user directly downloads SparkR from CRAN.
> To do that we can introduce a install_spark function that takes in the 
> following arguments
> {code}
> hadoop_version
> url_to_use # defaults to apache
> local_dir # defaults to a cache dir
> {code} 
> Further more I think we can automatically run this from sparkR.init if we 
> find Spark home and the JARs missing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16508) Fix documentation warnings found by R CMD check

2016-07-15 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380456#comment-15380456
 ] 

Felix Cheung commented on SPARK-16508:
--

Sure, from checking 
https://gist.github.com/shivaram/62866c4ca59c5d34b8963939cf04b5eb

1) attach
{code}
* checking R code for possible problems ... NOTE
Found the following calls to attach():
File ‘SparkR/R/DataFrame.R’:
  attach(newEnv, pos = pos, name = name, warn.conflicts = warn.conflicts)
See section ‘Good practice’ in ‘?attach’.

?attach
Good practice:

 ‘attach’ has the side effect of altering the search path and this
 can easily lead to the wrong object of a particular name being
 found.  People do often forget to ‘detach’ databases.

 In interactive use, ‘with’ is usually preferable to the use of
 ‘attach’/‘detach’, unless ‘what’ is a ‘save()’-produced file in
 which case ‘attach()’ is a (safety) wrapper for ‘load()’.

 In programming, functions should not change the search path unless
 that is their purpose.  Often ‘with’ can be used within a
 function. If not, good practice is to

• Always use a distinctive ‘name’ argument, and

• To immediately follow the ‘attach’ call by an ‘on.exit’ call
  to ‘detach’ using the distinctive name.

 This ensures that the search path is left unchanged even if the
 function is interrupted or if code after the ‘attach’ call changes
 the search path.
{code}
--> not sure what we should do here, we should avoid exposing "attach" it seems 
like.

2. missing documentation
{code}
* checking for missing documentation entries ... WARNING
Undocumented S4 methods:
  generic 'cache' and siglist 'RDD'
  generic 'collect' and siglist 'RDD'
...
{code}
--> SPARK-16519

3. Missing argument documentation
{code}
checking Rd \usage sections ... WARNING
Undocumented arguments in documentation object 'add_months'
  ‘y’ ‘x’
...
{code}
--> SPARK-16507

4. window.*
{code}
Functions with \usage entries need to have the appropriate \alias
entries, and all their arguments documented.
The \usage entries must correspond to syntactically valid R code.
See chapter ‘Writing R documentation files’ in the ‘Writing R
Extensions’ manual.
S3 methods shown with full name in documentation object 'window.orderBy':
  ‘window.orderBy’

S3 methods shown with full name in documentation object 'window.partitionBy':
  ‘window.partitionBy’
{code}
--> SPARK-16509

> Fix documentation warnings found by R CMD check
> ---
>
> Key: SPARK-16508
> URL: https://issues.apache.org/jira/browse/SPARK-16508
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Shivaram Venkataraman
>
> A full list of warnings after the fixes in SPARK-16507 is at 
> https://gist.github.com/shivaram/62866c4ca59c5d34b8963939cf04b5eb 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-16508) Fix documentation warnings found by R CMD check

2016-07-15 Thread Felix Cheung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-16508:
-
Comment: was deleted

(was: Sure but I thought your PR is already handling most of this, and there 
are other PR for RDD methods and so on?
)

> Fix documentation warnings found by R CMD check
> ---
>
> Key: SPARK-16508
> URL: https://issues.apache.org/jira/browse/SPARK-16508
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Shivaram Venkataraman
>
> A full list of warnings after the fixes in SPARK-16507 is at 
> https://gist.github.com/shivaram/62866c4ca59c5d34b8963939cf04b5eb 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16519) Handle SparkR RDD generics that create warnings in R CMD check

2016-07-15 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15380460#comment-15380460
 ] 

Felix Cheung commented on SPARK-16519:
--

I can take this

> Handle SparkR RDD generics that create warnings in R CMD check
> --
>
> Key: SPARK-16519
> URL: https://issues.apache.org/jira/browse/SPARK-16519
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Shivaram Venkataraman
>
> One of the warnings we get from R CMD check is that RDD implementations of 
> some of the generics are not documented. These generics are shared between 
> RDD, DataFrames in SparkR. The list includes
> {quote}
> WARNING
> Undocumented S4 methods:
>   generic 'cache' and siglist 'RDD'
>   generic 'collect' and siglist 'RDD'
>   generic 'count' and siglist 'RDD'
>   generic 'distinct' and siglist 'RDD'
>   generic 'first' and siglist 'RDD'
>   generic 'join' and siglist 'RDD,RDD'
>   generic 'length' and siglist 'RDD'
>   generic 'partitionBy' and siglist 'RDD'
>   generic 'persist' and siglist 'RDD,character'
>   generic 'repartition' and siglist 'RDD'
>   generic 'show' and siglist 'RDD'
>   generic 'take' and siglist 'RDD,numeric'
>   generic 'unpersist' and siglist 'RDD'
> {quote}
> As described in 
> https://stat.ethz.ch/pipermail/r-devel/2003-September/027490.html this looks 
> like a limitation of R where exporting a generic from a package also exports 
> all the implementations of that generic. 
> One way to get around this is to remove the RDD API or rename the methods in 
> Spark 2.1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-14816) Update MLlib, GraphX, SparkR websites for 2.0

2016-07-17 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-14816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15381792#comment-15381792
 ] 

Felix Cheung commented on SPARK-14816:
--

could be a nice show case to have a website? 
I do spend more time on the programming guide though (like 
http://spark.apache.org/docs/latest/sparkr.html, 
http://spark.apache.org/docs/latest/mllib-guide.html)

> Update MLlib, GraphX, SparkR websites for 2.0
> -
>
> Key: SPARK-14816
> URL: https://issues.apache.org/jira/browse/SPARK-14816
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, GraphX, ML, MLlib, SparkR
>Reporter: Joseph K. Bradley
>Priority: Critical
>
> Update the sub-projects' websites to include new features in this release.
> For MLlib, make it clear that the DataFrame-based API is the primary one now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16508) Fix documentation warnings found by R CMD check

2016-07-17 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15381794#comment-15381794
 ] 

Felix Cheung commented on SPARK-16508:
--

I will check this since SPARK-16507 has been merged

> Fix documentation warnings found by R CMD check
> ---
>
> Key: SPARK-16508
> URL: https://issues.apache.org/jira/browse/SPARK-16508
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Shivaram Venkataraman
>
> A full list of warnings after the fixes in SPARK-16507 is at 
> https://gist.github.com/shivaram/62866c4ca59c5d34b8963939cf04b5eb 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-15799) Release SparkR on CRAN

2016-07-18 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-15799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15381855#comment-15381855
 ] 

Felix Cheung commented on SPARK-15799:
--

Sure, I think they are very valid cases.
For #2, isn't it a lot more efficient to have it interface with Spark/Scala/JVM 
in some ways, like the data source API? Going through R-to-JVM APIs would mean 
we need to potentially pipe a lot of data through the existing SparkR-JVM 
socket connection. 

I was thinking more in the line of a (hypothetical) R-binding for KeystoneML, 
SnappyData, or Stratio. Users of these platforms do not interact with Spark 
directly.

If they talk to SparkR directory without requiring users to manage Spark (eg. 
not having to submit job via spark-submit), then is SparkR managing Spark? If 
working with a cluster, what's deploying Spark jar to the cluster? And what if 
this is a CDH/HDP cluster and Spark is already there?


> Release SparkR on CRAN
> --
>
> Key: SPARK-15799
> URL: https://issues.apache.org/jira/browse/SPARK-15799
> Project: Spark
>  Issue Type: New Feature
>  Components: SparkR
>Reporter: Xiangrui Meng
>
> Story: "As an R user, I would like to see SparkR released on CRAN, so I can 
> use SparkR easily in an existing R environment and have other packages built 
> on top of SparkR."
> I made this JIRA with the following questions in mind:
> * Are there known issues that prevent us releasing SparkR on CRAN?
> * Do we want to package Spark jars in the SparkR release?
> * Are there license issues?
> * How does it fit into Spark's release process?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16611) Expose several hidden DataFrame/RDD functions

2016-07-18 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383047#comment-15383047
 ] 

Felix Cheung commented on SPARK-16611:
--

Is this SPARK-16581?

> Expose several hidden DataFrame/RDD functions
> -
>
> Key: SPARK-16611
> URL: https://issues.apache.org/jira/browse/SPARK-16611
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Reporter: Oscar D. Lara Yejas
>
> Expose the following functions:
> - lapply or map
> - lapplyPartition or mapPartition
> - flatMap
> - RDD
> - toRDD
> - getJRDD
> - cleanup.jobj
> cc:
> [~javierluraschi] [~j...@rstudio.com] [~shivaram]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16611) Expose several hidden DataFrame/RDD functions

2016-07-19 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15385033#comment-15385033
 ] 

Felix Cheung commented on SPARK-16611:
--

there's also spark.lapply

> Expose several hidden DataFrame/RDD functions
> -
>
> Key: SPARK-16611
> URL: https://issues.apache.org/jira/browse/SPARK-16611
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Reporter: Oscar D. Lara Yejas
>
> Expose the following functions:
> - lapply or map
> - lapplyPartition or mapPartition
> - flatMap
> - RDD
> - toRDD
> - getJRDD
> - cleanup.jobj
> cc:
> [~javierluraschi] [~j...@rstudio.com] [~shivaram]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-16507) Add CRAN checks to SparkR

2016-07-20 Thread Felix Cheung (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Cheung updated SPARK-16507:
-
Target Version/s: 2.0.0, 2.1.0  (was: 2.1.0)

> Add CRAN checks to SparkR 
> --
>
> Key: SPARK-16507
> URL: https://issues.apache.org/jira/browse/SPARK-16507
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Shivaram Venkataraman
>Assignee: Shivaram Venkataraman
> Fix For: 2.0.0
>
>
> One of the steps to publishing SparkR is to pass the `R CMD check --as-cran`. 
> We should add a script to do this and fix any errors / warnings we find



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-16611) Expose several hidden DataFrame/RDD functions

2016-07-23 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-16611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15390856#comment-15390856
 ] 

Felix Cheung commented on SPARK-16611:
--

where are we on this? I'm working on removing the RDD functions SPARK-16519

> Expose several hidden DataFrame/RDD functions
> -
>
> Key: SPARK-16611
> URL: https://issues.apache.org/jira/browse/SPARK-16611
> Project: Spark
>  Issue Type: Improvement
>  Components: SparkR
>Reporter: Oscar D. Lara Yejas
>
> Expose the following functions:
> - lapply or map
> - lapplyPartition or mapPartition
> - flatMap
> - RDD
> - toRDD
> - getJRDD
> - cleanup.jobj
> cc:
> [~javierluraschi] [~j...@rstudio.com] [~shivaram]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   3   4   5   6   7   8   9   10   >