[jira] [Commented] (SPARK-1359) SGD implementation is not efficient

2016-03-30 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-1359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15219190#comment-15219190
 ] 

Yu Ishikawa commented on SPARK-1359:


[~mbaddar] Since the current ann in mllib depends on `GradientDescent`, we 
should modify the efficienty.
How do we evaluate new implementation against the current implementation? And 
What are better tasks to evaluate it?
- Metrics
1. Convergence Effieiency
2. Compute Cost
3. Compute Time
4. Other
- Task
1. Logistic Regression and Linear Regression with random generated data
2. Logistic Regression and Linear Regression with any Kaggle data
3. Other

I make an implementation of Parallelized Stochastic Gradient Descent.
https://github.com/yu-iskw/spark-parallelized-sgd

> SGD implementation is not efficient
> ---
>
> Key: SPARK-1359
> URL: https://issues.apache.org/jira/browse/SPARK-1359
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 0.9.0, 1.0.0
>Reporter: Xiangrui Meng
>
> The SGD implementation samples a mini-batch to compute the stochastic 
> gradient. This is not efficient because examples are provided via an iterator 
> interface. We have to scan all of them to obtain a sample.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13265) Refactoring of basic ML import/export for other file system besides HDFS

2016-02-10 Thread Yu Ishikawa (JIRA)
Yu Ishikawa created SPARK-13265:
---

 Summary: Refactoring of basic ML import/export for other file 
system besides HDFS
 Key: SPARK-13265
 URL: https://issues.apache.org/jira/browse/SPARK-13265
 Project: Spark
  Issue Type: Bug
  Components: ML
Reporter: Yu Ishikawa


We can't save a model into other file system besides HDFS, for example Amazon 
S3. Because the file system is fixed at Spark 1.6.

https://github.com/apache/spark/blob/v1.6.0/mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala#L78

{noformat}
scala> val kmeans = new KMeans().setK(2)
scala> model.write.overwrite().save("s3n://test-bucket/tmp/test-kmeans/")
java.lang.IllegalArgumentException: Wrong FS: 
s3n://test-bucket/tmp/test-kmeans, expected: 
hdfs://ec2-54-248-42-97.ap-northeast-1.compute.amazonaws.c
om:9000
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:590)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:170)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:803)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1332)
at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:80)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:36)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:41)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:43)
at $iwC$$iwC$$iwC$$iwC$$iwC.(:45)
at $iwC$$iwC$$iwC$$iwC.(:47)
at $iwC$$iwC$$iwC.(:49)
at $iwC$$iwC.(:51)
at $iwC.(:53)
at (:55)
at .(:59)
at .()
at .(:7)
at .()
at $print()
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
at 
org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
at 
org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at 
org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
at 
org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
at 
org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
at 
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
at 
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at 
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at 
scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at 
org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-13265) Refactoring of basic ML import/export for other file system besides HDFS

2016-02-10 Thread Yu Ishikawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Ishikawa updated SPARK-13265:

Description: 
We can't save a model into other file system besides HDFS, for example Amazon 
S3. Because the file system is fixed at Spark 1.6.

https://github.com/apache/spark/blob/v1.6.0/mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala#L78

When I tried to export a KMeans model in to Amazon S3
{noformat}
scala> val kmeans = new KMeans().setK(2)
scala> model.write.overwrite().save("s3n://test-bucket/tmp/test-kmeans/")
java.lang.IllegalArgumentException: Wrong FS: 
s3n://test-bucket/tmp/test-kmeans, expected: 
hdfs://ec2-54-248-42-97.ap-northeast-1.compute.amazonaws.c
om:9000
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:590)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:170)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:803)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1332)
at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:80)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:36)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:41)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:43)
at $iwC$$iwC$$iwC$$iwC$$iwC.(:45)
at $iwC$$iwC$$iwC$$iwC.(:47)
at $iwC$$iwC$$iwC.(:49)
at $iwC$$iwC.(:51)
at $iwC.(:53)
at (:55)
at .(:59)
at .()
at .(:7)
at .()
at $print()
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
at 
org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
at 
org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at 
org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
at 
org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
at 
org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
at 
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
at 
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at 
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at 
scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at 
org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
{noformat}

  was:
We can't save a model into other file system besides HDFS, for example Amazon 
S3. Because the file system is fixed at Spark 1.6.

https://github.com/apache/spark/blob/v1.6.0/mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala#L78

{noformat}
scala> val kmeans = new KMeans().setK(2)
scala> model.write.overwrite().save("s3n://test-bucket/tmp/test-kmeans/")
java.lang.IllegalArgumentException: Wrong FS: 
s3n://test-bucket/tmp/test-kmeans, expected: 
hdfs://ec2-54-248-42-97.ap-northeast-1.compute.amazonaws.c
om:9000
at 

[jira] [Updated] (SPARK-13265) Refactoring of basic ML import/export for other file system besides HDFS

2016-02-10 Thread Yu Ishikawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Ishikawa updated SPARK-13265:

Description: 
We can't save a model into other file system besides HDFS, for example Amazon 
S3. Because the file system is fixed at Spark 1.6.

https://github.com/apache/spark/blob/v1.6.0/mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala#L78

When I tried to export a KMeans model in to Amazon S3
{noformat}
scala> val kmeans = new KMeans().setK(2)
scala> kmeans.fit(train)
scala> model.write.overwrite().save("s3n://test-bucket/tmp/test-kmeans/")
java.lang.IllegalArgumentException: Wrong FS: 
s3n://test-bucket/tmp/test-kmeans, expected: 
hdfs://ec2-54-248-42-97.ap-northeast-1.compute.amazonaws.c
om:9000
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:590)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:170)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:803)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1332)
at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:80)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:36)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:41)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:43)
at $iwC$$iwC$$iwC$$iwC$$iwC.(:45)
at $iwC$$iwC$$iwC$$iwC.(:47)
at $iwC$$iwC$$iwC.(:49)
at $iwC$$iwC.(:51)
at $iwC.(:53)
at (:55)
at .(:59)
at .()
at .(:7)
at .()
at $print()
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
at 
org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
at 
org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at 
org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
at 
org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
at 
org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
at 
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
at 
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at 
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at 
scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at 
org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
{noformat}

  was:
We can't save a model into other file system besides HDFS, for example Amazon 
S3. Because the file system is fixed at Spark 1.6.

https://github.com/apache/spark/blob/v1.6.0/mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala#L78

When I tried to export a KMeans model in to Amazon S3
{noformat}
scala> val kmeans = new KMeans().setK(2)
scala> model.write.overwrite().save("s3n://test-bucket/tmp/test-kmeans/")
java.lang.IllegalArgumentException: Wrong FS: 
s3n://test-bucket/tmp/test-kmeans, expected: 

[jira] [Updated] (SPARK-13265) Refactoring of basic ML import/export for other file system besides HDFS

2016-02-10 Thread Yu Ishikawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Ishikawa updated SPARK-13265:

Description: 
We can't save a model into other file system besides HDFS, for example Amazon 
S3. Because the file system is fixed at Spark 1.6.

https://github.com/apache/spark/blob/v1.6.0/mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala#L78

When I tried to export a KMeans model into Amazon S3, I got the error.

{noformat}
scala> val kmeans = new KMeans().setK(2)
scala> val model = kmeans.fit(train)
scala> model.write.overwrite().save("s3n://test-bucket/tmp/test-kmeans/")
java.lang.IllegalArgumentException: Wrong FS: 
s3n://test-bucket/tmp/test-kmeans, expected: 
hdfs://ec2-54-248-42-97.ap-northeast-1.compute.amazonaws.c
om:9000
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:590)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:170)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:803)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1332)
at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:80)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:36)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:41)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:43)
at $iwC$$iwC$$iwC$$iwC$$iwC.(:45)
at $iwC$$iwC$$iwC$$iwC.(:47)
at $iwC$$iwC$$iwC.(:49)
at $iwC$$iwC.(:51)
at $iwC.(:53)
at (:55)
at .(:59)
at .()
at .(:7)
at .()
at $print()
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
at 
org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
at 
org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at 
org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
at 
org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
at 
org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
at 
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
at 
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at 
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at 
scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at 
org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
{noformat}


  was:
We can't save a model into other file system besides HDFS, for example Amazon 
S3. Because the file system is fixed at Spark 1.6.

https://github.com/apache/spark/blob/v1.6.0/mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala#L78

When I tried to export a KMeans model in to Amazon S3
{noformat}
scala> val kmeans = new KMeans().setK(2)
scala> kmeans.fit(train)
scala> model.write.overwrite().save("s3n://test-bucket/tmp/test-kmeans/")
java.lang.IllegalArgumentException: Wrong FS: 

[jira] [Commented] (SPARK-11618) Refactoring of basic ML import/export

2016-02-09 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15140103#comment-15140103
 ] 

Yu Ishikawa commented on SPARK-11618:
-

Hi [~josephkb], 

I have a question about ML import/export. It seems that the current saving 
method supports only saving a model on HDFS. Do we have any plan to save one on 
other file system, such as Amazon S3?
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala#L77

> Refactoring of basic ML import/export
> -
>
> Key: SPARK-11618
> URL: https://issues.apache.org/jira/browse/SPARK-11618
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Joseph K. Bradley
>Assignee: Joseph K. Bradley
> Fix For: 1.6.0
>
>
> This is for a few updates to the original PR for basic ML import/export in 
> [SPARK-11217].
> * The original PR diverges from the design doc in that it does not include 
> the Spark version or a model format version.  We should include the Spark 
> version in the metadata.  If we do that, then we don't really need a model 
> format version.
> * Proposal: DefaultParamsWriter includes two separable pieces of logic in 
> save(): (a) handling overwriting and (b) saving Params.  I want to separate 
> these by putting (a) in a save() method in Writer which calls an abstract 
> saveImpl, and (b) in the saveImpl implementation in DefaultParamsWriter.  
> This is described below:
> {code}
> abstract class Writer {
>   def save(path: String) = {
> // handle overwrite
> saveImpl(path)
>   }
>   def saveImpl(path: String)   // abstract
> }
> class DefaultParamsWriter extends Writer {
>   def saveImpl(path: String) = {
> // save Params
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-13239) Click-Through Rate Prediction

2016-02-08 Thread Yu Ishikawa (JIRA)
Yu Ishikawa created SPARK-13239:
---

 Summary: Click-Through Rate Prediction
 Key: SPARK-13239
 URL: https://issues.apache.org/jira/browse/SPARK-13239
 Project: Spark
  Issue Type: Sub-task
  Components: ML
Reporter: Yu Ishikawa
Priority: Minor


Apply ML Pipeline API to Click-Through Rate Prediction
https://www.kaggle.com/c/avazu-ctr-prediction



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-13239) Click-Through Rate Prediction

2016-02-08 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-13239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15138490#comment-15138490
 ] 

Yu Ishikawa commented on SPARK-13239:
-

I'm working on this issue.

> Click-Through Rate Prediction
> -
>
> Key: SPARK-13239
> URL: https://issues.apache.org/jira/browse/SPARK-13239
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Yu Ishikawa
>Priority: Minor
>
> Apply ML Pipeline API to Click-Through Rate Prediction
> https://www.kaggle.com/c/avazu-ctr-prediction



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10870) Criteo Display Advertising Challenge

2016-01-20 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15110060#comment-15110060
 ] 

Yu Ishikawa commented on SPARK-10870:
-

[~prudenko] Should we test the Kaggle data with the winning solution used GBDT 
encoder?

> Criteo Display Advertising Challenge
> 
>
> Key: SPARK-10870
> URL: https://issues.apache.org/jira/browse/SPARK-10870
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Reporter: Peter Rudenko
>
> Very useful dataset to test pipeline because of:
> # "Big data" dataset - original Kaggle competition dataset is 12 gb, but 
> there's [1tb|http://labs.criteo.com/downloads/download-terabyte-click-logs/] 
> dataset of the same schema as well.
> # Sparse models - categorical features has high cardinality
> # Reproducible results - because it's public and many other distributed 
> machine learning libraries (e.g. 
> [wormwhole|https://github.com/dmlc/wormhole/blob/master/doc/tutorial/criteo_kaggle.rst],
>  [parameter 
> server|https://github.com/dmlc/parameter_server/blob/master/example/linear/criteo/README.md],
>  [azure 
> ml|https://azure.microsoft.com/en-us/documentation/articles/machine-learning-data-science-process-hive-criteo-walkthrough/#mltasks]
>  etc.) have made a base line benchmarks on which we could compare.
> I have some base line results with custom models (GBDT encoders and tuned LR) 
> on spark-1.4. Will make pipelines using public spark model. [Winning 
> solution|http://www.csie.ntu.edu.tw/~r01922136/kaggle-2014-criteo.pdf] used 
> GBDT encoder (not available in spark, but not difficult to make one from GBT 
> from mllib) + hashing + factorization machine (planned for spark-1.6).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12215) User guide section for KMeans in spark.ml

2015-12-09 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049975#comment-15049975
 ] 

Yu Ishikawa commented on SPARK-12215:
-

I'll work on this issue.

> User guide section for KMeans in spark.ml
> -
>
> Key: SPARK-12215
> URL: https://issues.apache.org/jira/browse/SPARK-12215
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, ML
>Reporter: Joseph K. Bradley
>Assignee: Yu Ishikawa
>
> [~yuu.ishik...@gmail.com] Will you have time to add a user guide section for 
> this?  Thanks in advance!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12215) User guide section for KMeans in spark.ml

2015-12-09 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15050078#comment-15050078
 ] 

Yu Ishikawa commented on SPARK-12215:
-

I have sent a pull request about this issue. 
https://github.com/apache/spark/pull/10244

> User guide section for KMeans in spark.ml
> -
>
> Key: SPARK-12215
> URL: https://issues.apache.org/jira/browse/SPARK-12215
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, ML
>Reporter: Joseph K. Bradley
>Assignee: Yu Ishikawa
>
> [~yuu.ishik...@gmail.com] Will you have time to add a user guide section for 
> this?  Thanks in advance!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10285) Add @since annotation to pyspark.ml.util

2015-12-09 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15050085#comment-15050085
 ] 

Yu Ishikawa commented on SPARK-10285:
-

[~mengxr] can we close this issue? Since [~davies] told me that there are no 
public methods under pyspark.ml.util.

https://github.com/apache/spark/pull/8695#issuecomment-139377373

> Add @since annotation to pyspark.ml.util
> 
>
> Key: SPARK-10285
> URL: https://issues.apache.org/jira/browse/SPARK-10285
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, ML, PySpark
>Reporter: Xiangrui Meng
>Assignee: Yu Ishikawa
>Priority: Minor
>  Labels: starter
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6518) Add example code and user guide for bisecting k-means

2015-11-24 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15025717#comment-15025717
 ] 

Yu Ishikawa commented on SPARK-6518:


All right. I'll send a PR soon. Thanks!

> Add example code and user guide for bisecting k-means
> -
>
> Key: SPARK-6518
> URL: https://issues.apache.org/jira/browse/SPARK-6518
> Project: Spark
>  Issue Type: Documentation
>  Components: MLlib
>Reporter: Yu Ishikawa
>Assignee: Yu Ishikawa
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6518) Add example code and user guide for bisecting k-means

2015-11-24 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15025995#comment-15025995
 ] 

Yu Ishikawa commented on SPARK-6518:


Can I split this issue to docs and an example?

> Add example code and user guide for bisecting k-means
> -
>
> Key: SPARK-6518
> URL: https://issues.apache.org/jira/browse/SPARK-6518
> Project: Spark
>  Issue Type: Documentation
>  Components: MLlib
>Reporter: Yu Ishikawa
>Assignee: Yu Ishikawa
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8459) Add import/export to spark.mllib bisecting k-means

2015-11-12 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003015#comment-15003015
 ] 

Yu Ishikawa commented on SPARK-8459:


I'm working on this issue.

> Add import/export to spark.mllib bisecting k-means
> --
>
> Key: SPARK-8459
> URL: https://issues.apache.org/jira/browse/SPARK-8459
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Yu Ishikawa
>  Labels: 1.7.0
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11664) Add methods to get bisecting k-means cluster structure

2015-11-12 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003085#comment-15003085
 ] 

Yu Ishikawa commented on SPARK-11664:
-

[~srowen] thank you for letting me know. I intended to set it as "Labels".

> Add methods to get bisecting k-means cluster structure
> --
>
> Key: SPARK-11664
> URL: https://issues.apache.org/jira/browse/SPARK-11664
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib
>Reporter: Yu Ishikawa
>Priority: Minor
>
> I think users want to visualize the result of bisecting k-means clustering as 
> a dendrogram in order to confirm it. So it would be great to support method 
> to get the cluster tree structure as an adjacency list, linkage matrix and so 
> on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-11666) Find the best `k` by cutting bisecting k-means cluster tree without recomputation

2015-11-11 Thread Yu Ishikawa (JIRA)
Yu Ishikawa created SPARK-11666:
---

 Summary: Find the best `k` by cutting bisecting k-means cluster 
tree without recomputation
 Key: SPARK-11666
 URL: https://issues.apache.org/jira/browse/SPARK-11666
 Project: Spark
  Issue Type: Sub-task
  Components: MLlib
Reporter: Yu Ishikawa
Priority: Minor


For example, scikit-learn's hierarchical clustering support a feature to 
extract partial tree from the result. We should support a feature like that in 
order to reduce compute cost.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-11664) Add methods to get bisecting k-means cluster structure

2015-11-11 Thread Yu Ishikawa (JIRA)
Yu Ishikawa created SPARK-11664:
---

 Summary: Add methods to get bisecting k-means cluster structure
 Key: SPARK-11664
 URL: https://issues.apache.org/jira/browse/SPARK-11664
 Project: Spark
  Issue Type: Sub-task
  Components: MLlib
Reporter: Yu Ishikawa
Priority: Minor


I think users want to visualize the result of bisecting k-means clustering as a 
dendrogram in order to confirm it. So it would be great to support method to 
get the cluster tree structure as an adjacency list, linkage matrix and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-11665) Support other distance metrics for bisecting k-means

2015-11-11 Thread Yu Ishikawa (JIRA)
Yu Ishikawa created SPARK-11665:
---

 Summary: Support other distance metrics for bisecting k-means
 Key: SPARK-11665
 URL: https://issues.apache.org/jira/browse/SPARK-11665
 Project: Spark
  Issue Type: Sub-task
  Components: MLlib
Reporter: Yu Ishikawa
Priority: Minor


Some guys reqested me to support other distance metrics, such as cosine 
distance, tanimoto distance, in bisecting k-means. 

We should
- desing the interfaces for distance metrics
- support the distances



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-8459) Add import/export to spark.mllib bisecting k-means

2015-11-10 Thread Yu Ishikawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-8459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Ishikawa updated SPARK-8459:
---
Labels: 1.7.0  (was: )

> Add import/export to spark.mllib bisecting k-means
> --
>
> Key: SPARK-8459
> URL: https://issues.apache.org/jira/browse/SPARK-8459
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Yu Ishikawa
>  Labels: 1.7.0
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11611) Python API for bisecting k-means

2015-11-09 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997875#comment-14997875
 ] 

Yu Ishikawa commented on SPARK-11611:
-

[~mengxr] can we change the target version from 1.7.0 to 1.6.0?

> Python API for bisecting k-means
> 
>
> Key: SPARK-11611
> URL: https://issues.apache.org/jira/browse/SPARK-11611
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib, PySpark
>Reporter: Xiangrui Meng
>
> Implement Python API for bisecting k-means.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11610) Make the docs of LDAModel.describeTopics in Python more specific

2015-11-09 Thread Yu Ishikawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Ishikawa updated SPARK-11610:

Target Version/s:   (was: 1.6.0)

> Make the docs of LDAModel.describeTopics in Python more specific
> 
>
> Key: SPARK-11610
> URL: https://issues.apache.org/jira/browse/SPARK-11610
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib, PySpark
>Reporter: Yu Ishikawa
>Assignee: Yu Ishikawa
>Priority: Trivial
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11610) Make the docs of LDAModel.describeTopics in Python more specific

2015-11-09 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997716#comment-14997716
 ] 

Yu Ishikawa commented on SPARK-11610:
-

[~josephkb] sorry for that. Thanks for fixing it.

> Make the docs of LDAModel.describeTopics in Python more specific
> 
>
> Key: SPARK-11610
> URL: https://issues.apache.org/jira/browse/SPARK-11610
> Project: Spark
>  Issue Type: Sub-task
>  Components: MLlib, PySpark
>Reporter: Yu Ishikawa
>Assignee: Yu Ishikawa
>Priority: Trivial
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11611) Python API for bisecting k-means

2015-11-09 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997725#comment-14997725
 ] 

Yu Ishikawa commented on SPARK-11611:
-

I'm working on this issue.

> Python API for bisecting k-means
> 
>
> Key: SPARK-11611
> URL: https://issues.apache.org/jira/browse/SPARK-11611
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib, PySpark
>Reporter: Xiangrui Meng
>
> Implement Python API for bisecting k-means.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-6517) Bisecting k-means clustering

2015-11-09 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997724#comment-14997724
 ] 

Yu Ishikawa edited comment on SPARK-6517 at 11/10/15 12:20 AM:
---

[~jeffzhang] thank you for your cooperation. But, I'm ready to send a PR for 
that issue.


was (Author: yuu.ishik...@gmail.com):
[~jeffzhang] thank you for your cooperation. But, I'm readdy to send a PR for 
that issue.

> Bisecting k-means clustering
> 
>
> Key: SPARK-6517
> URL: https://issues.apache.org/jira/browse/SPARK-6517
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Yu Ishikawa
>Assignee: Yu Ishikawa
>  Labels: clustering
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6517) Bisecting k-means clustering

2015-11-09 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997724#comment-14997724
 ] 

Yu Ishikawa commented on SPARK-6517:


[~jeffzhang] thank you for your cooperation. But, I'm readdy to send a PR for 
that issue.

> Bisecting k-means clustering
> 
>
> Key: SPARK-6517
> URL: https://issues.apache.org/jira/browse/SPARK-6517
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib
>Reporter: Yu Ishikawa
>Assignee: Yu Ishikawa
>  Labels: clustering
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-11610) Make the docs of LDAModel.describeTopics in Python more specific

2015-11-09 Thread Yu Ishikawa (JIRA)
Yu Ishikawa created SPARK-11610:
---

 Summary: Make the docs of LDAModel.describeTopics in Python more 
specific
 Key: SPARK-11610
 URL: https://issues.apache.org/jira/browse/SPARK-11610
 Project: Spark
  Issue Type: Sub-task
  Components: MLlib, PySpark
Reporter: Yu Ishikawa
Priority: Trivial






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-11566) Refactoring GaussianMixtureModel.gaussians in Python

2015-11-06 Thread Yu Ishikawa (JIRA)
Yu Ishikawa created SPARK-11566:
---

 Summary: Refactoring GaussianMixtureModel.gaussians in Python
 Key: SPARK-11566
 URL: https://issues.apache.org/jira/browse/SPARK-11566
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Affects Versions: 1.5.1
Reporter: Yu Ishikawa
Priority: Trivial


We could also implement {{GaussianMixtureModelWrapper.gaussians}} in Scala with 
{{SerDe.dumps}}, instead of returning Java {{Object}}. So, it would be a little 
simpler and more efficient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-11566) Refactoring GaussianMixtureModel.gaussians in Python

2015-11-06 Thread Yu Ishikawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Ishikawa updated SPARK-11566:

Component/s: PySpark

> Refactoring GaussianMixtureModel.gaussians in Python
> 
>
> Key: SPARK-11566
> URL: https://issues.apache.org/jira/browse/SPARK-11566
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib, PySpark
>Affects Versions: 1.5.1
>Reporter: Yu Ishikawa
>Priority: Trivial
>
> We could also implement {{GaussianMixtureModelWrapper.gaussians}} in Scala 
> with {{SerDe.dumps}}, instead of returning Java {{Object}}. So, it would be a 
> little simpler and more efficient.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11515) QuantileDiscretizer should take random seed

2015-11-05 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991294#comment-14991294
 ] 

Yu Ishikawa commented on SPARK-11515:
-

I'll work on this issue.

> QuantileDiscretizer should take random seed
> ---
>
> Key: SPARK-11515
> URL: https://issues.apache.org/jira/browse/SPARK-11515
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Reporter: Joseph K. Bradley
>Priority: Minor
>
> QuantileDiscretizer takes a random sample to select bins.  It currently does 
> not specify a seed for the XORShiftRandom, but it should take a seed by 
> extending the HasSeed Param.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9722) Pass random seed to spark.ml RandomForest findSplitsBins

2015-11-04 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14990877#comment-14990877
 ] 

Yu Ishikawa commented on SPARK-9722:


[~josephkb] I'll add a seed Param to {{DecisionTreeClassifier}} and 
{{DecisionTreeRegressor}}.

> Pass random seed to spark.ml RandomForest findSplitsBins
> 
>
> Key: SPARK-9722
> URL: https://issues.apache.org/jira/browse/SPARK-9722
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Reporter: Joseph K. Bradley
>Assignee: Yu Ishikawa
>Priority: Trivial
> Fix For: 1.6.0
>
>
> Trees use XORShiftRandom when binning continuous features.  Currently, they 
> use a fixed seed of 1.  They should accept a random seed param and use that 
> instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10729) word2vec model save for python

2015-11-04 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991206#comment-14991206
 ] 

Yu Ishikawa commented on SPARK-10729:
-

Sorry, the cause isn't `@inherit_doc`. I misunderstood.
Anyway, we should discuss the documentation.

> word2vec model save for python
> --
>
> Key: SPARK-10729
> URL: https://issues.apache.org/jira/browse/SPARK-10729
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 1.4.1, 1.5.0
>Reporter: Joseph A Gartner III
> Fix For: 1.5.0
>
>
> The ability to save a word2vec model has not been ported to python, and would 
> be extremely useful to have given the long training period.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-11492) Ignore commented code warnings in SparkR

2015-11-03 Thread Yu Ishikawa (JIRA)
Yu Ishikawa created SPARK-11492:
---

 Summary: Ignore commented code warnings in SparkR
 Key: SPARK-11492
 URL: https://issues.apache.org/jira/browse/SPARK-11492
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Yu Ishikawa
Priority: Minor


cc [~shivaram]

We have many commented code warnings under the lastest lintr. Should we turnn 
off {{commented_code_linter}} rule of lintr?

{noformat}
# rdd <- lapply(parallelize(sc, 1:10), function(x) list(a=x, b=as.character(x)))
  ^~
R/SQLContext.R:176:3: style: Commented code should be removed.
# df <- toDF(rdd)
  ^~~
R/SQLContext.R:232:3: style: Commented code should be removed.
# sc <- sparkR.init()
  ^~~
R/SQLContext.R:233:3: style: Commented code should be removed.
# sqlContext <- sparkRSQL.init(sc)
  ^~~~
R/SQLContext.R:234:3: style: Commented code should be removed.
# rdd <- texFile(sc, "path/to/json")
  ^~
R/SQLContext.R:235:3: style: Commented code should be removed.
# df <- jsonRDD(sqlContext, rdd)
  ^~
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-11492) Ignore commented code warnings in SparkR

2015-11-03 Thread Yu Ishikawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-11492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Ishikawa closed SPARK-11492.
---
Resolution: Duplicate

> Ignore commented code warnings in SparkR
> 
>
> Key: SPARK-11492
> URL: https://issues.apache.org/jira/browse/SPARK-11492
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Yu Ishikawa
>Priority: Minor
> Fix For: 1.5.0
>
>
> cc [~shivaram]
> We have many commented code warnings under the lastest lintr. Should we turnn 
> off {{commented_code_linter}} rule of lintr?
> {noformat}
> # rdd <- lapply(parallelize(sc, 1:10), function(x) list(a=x, 
> b=as.character(x)))
>   
> ^~
> R/SQLContext.R:176:3: style: Commented code should be removed.
> # df <- toDF(rdd)
>   ^~~
> R/SQLContext.R:232:3: style: Commented code should be removed.
> # sc <- sparkR.init()
>   ^~~
> R/SQLContext.R:233:3: style: Commented code should be removed.
> # sqlContext <- sparkRSQL.init(sc)
>   ^~~~
> R/SQLContext.R:234:3: style: Commented code should be removed.
> # rdd <- texFile(sc, "path/to/json")
>   ^~
> R/SQLContext.R:235:3: style: Commented code should be removed.
> # df <- jsonRDD(sqlContext, rdd)
>   ^~
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11492) Ignore commented code warnings in SparkR

2015-11-03 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14988952#comment-14988952
 ] 

Yu Ishikawa commented on SPARK-11492:
-

[~andreas.fe...@herold.at] sorry, thank you for letting me know.

> Ignore commented code warnings in SparkR
> 
>
> Key: SPARK-11492
> URL: https://issues.apache.org/jira/browse/SPARK-11492
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Yu Ishikawa
>Priority: Minor
> Fix For: 1.5.0
>
>
> cc [~shivaram]
> We have many commented code warnings under the lastest lintr. Should we turnn 
> off {{commented_code_linter}} rule of lintr?
> {noformat}
> # rdd <- lapply(parallelize(sc, 1:10), function(x) list(a=x, 
> b=as.character(x)))
>   
> ^~
> R/SQLContext.R:176:3: style: Commented code should be removed.
> # df <- toDF(rdd)
>   ^~~
> R/SQLContext.R:232:3: style: Commented code should be removed.
> # sc <- sparkR.init()
>   ^~~
> R/SQLContext.R:233:3: style: Commented code should be removed.
> # sqlContext <- sparkRSQL.init(sc)
>   ^~~~
> R/SQLContext.R:234:3: style: Commented code should be removed.
> # rdd <- texFile(sc, "path/to/json")
>   ^~
> R/SQLContext.R:235:3: style: Commented code should be removed.
> # df <- jsonRDD(sqlContext, rdd)
>   ^~
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10729) word2vec model save for python

2015-11-03 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14988962#comment-14988962
 ] 

Yu Ishikawa commented on SPARK-10729:
-

Got it. That's because Word2VecModel in Python isn't attached {{@inherit_doc}}. 
I think we should add the tag.

Could you close this issue? It would be great to discuss improving the 
documentation on another issue.

> word2vec model save for python
> --
>
> Key: SPARK-10729
> URL: https://issues.apache.org/jira/browse/SPARK-10729
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 1.4.1, 1.5.0
>Reporter: Joseph A Gartner III
>
> The ability to save a word2vec model has not been ported to python, and would 
> be extremely useful to have given the long training period.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10729) word2vec model save for python

2015-11-03 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14988926#comment-14988926
 ] 

Yu Ishikawa commented on SPARK-10729:
-

[~jgartner]  Is this issue same as 
https://issues.apache.org/jira/browse/SPARK-7104?

> word2vec model save for python
> --
>
> Key: SPARK-10729
> URL: https://issues.apache.org/jira/browse/SPARK-10729
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 1.4.1, 1.5.0
>Reporter: Joseph A Gartner III
>
> The ability to save a word2vec model has not been ported to python, and would 
> be extremely useful to have given the long training period.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-11263) lintr Throws Warnings on Commented Code in Documentation

2015-11-03 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-11263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14988965#comment-14988965
 ] 

Yu Ishikawa commented on SPARK-11263:
-

[~felixcheung] I think it would be great to ignore the commented code warnings 
with the settings of lintr.

> lintr Throws Warnings on Commented Code in Documentation
> 
>
> Key: SPARK-11263
> URL: https://issues.apache.org/jira/browse/SPARK-11263
> Project: Spark
>  Issue Type: Task
>  Components: SparkR
>Reporter: Sen Fang
>Priority: Minor
>
> This comes from a discussion in https://github.com/apache/spark/pull/9205
> Currently lintr throws many warnings around "style: Commented code should be 
> removed."
> For example
> {code}
> R/RDD.R:260:3: style: Commented code should be removed.
> # unpersist(rdd) # rdd@@env$isCached == FALSE
>   ^~~
> R/RDD.R:283:3: style: Commented code should be removed.
> # sc <- sparkR.init()
>   ^~~
> R/RDD.R:284:3: style: Commented code should be removed.
> # setCheckpointDir(sc, "checkpoint")
>   ^~
> {code}
> Some of them are legitimate warnings but most of them are simply code 
> examples of functions that are not part of public API. For example
> {code}
> # @examples
> #\dontrun{
> # sc <- sparkR.init()
> # rdd <- parallelize(sc, 1:10, 2L)
> # cache(rdd)
> #}
> {code}
> One workaround is to convert them back to Roxygen doc but assign {{#' @rdname 
> .ignore}} and Roxygen will skip these functions with message {{Skipping 
> invalid path: .ignore.Rd}}
> That being said, I feel people usually praise/criticize R package 
> documentation is "expert friendly". The convention seems to be providing as 
> much documentation as possible but don't export functions that is unstable or 
> developer only. If users choose to use them, they acknowledge the risk by 
> using {{:::}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6001) K-Means clusterer should return the assignments of input points to clusters

2015-11-01 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14984574#comment-14984574
 ] 

Yu Ishikawa commented on SPARK-6001:


[~josephkb] can we close this issue? 

> K-Means clusterer should return the assignments of input points to clusters
> ---
>
> Key: SPARK-6001
> URL: https://issues.apache.org/jira/browse/SPARK-6001
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 1.2.1
>Reporter: Derrick Burns
>Priority: Minor
>
> The K-Means clusterer returns a KMeansModel that contains the cluster 
> centers. However, when available, I suggest that the K-Means clusterer also 
> return an RDD of the assignments of the input data to the clusters. While the 
> assignments can be computed given the KMeansModel, why not return assignments 
> if they are available to save re-computation costs.
> The K-means implementation at 
> https://github.com/derrickburns/generalized-kmeans-clustering returns the 
> assignments when available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10266) Add @Since annotation to ml.tuning

2015-10-28 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979152#comment-14979152
 ] 

Yu Ishikawa commented on SPARK-10266:
-

Instead of [~Ehsan Mohyedin Kermani], I'll work on this issue.

> Add @Since annotation to ml.tuning
> --
>
> Key: SPARK-10266
> URL: https://issues.apache.org/jira/browse/SPARK-10266
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, ML
>Reporter: Xiangrui Meng
>Assignee: Ehsan Mohyedin Kermani
>Priority: Minor
>  Labels: starter
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10285) Add @since annotation to pyspark.ml.util

2015-09-17 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791815#comment-14791815
 ] 

Yu Ishikawa commented on SPARK-10285:
-

Close this PR because those are non-public API.

> Add @since annotation to pyspark.ml.util
> 
>
> Key: SPARK-10285
> URL: https://issues.apache.org/jira/browse/SPARK-10285
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, ML, PySpark
>Reporter: Xiangrui Meng
>Assignee: Yu Ishikawa
>Priority: Minor
>  Labels: starter
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10512) Fix @since when a function doesn't have doc

2015-09-09 Thread Yu Ishikawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Ishikawa updated SPARK-10512:

Description: 
When I tried to add @since to a function which doesn't have doc, @since didn't 
go well. It seems that {{___doc___}} is {{None}} under {{since}} decorator.

{noformat}
Traceback (most recent call last):
  File "/Users/01004981/.pyenv/versions/2.6.8/lib/python2.6/runpy.py", line 
122, in _run_module_as_main
"__main__", fname, loader, pkg_name)
  File "/Users/01004981/.pyenv/versions/2.6.8/lib/python2.6/runpy.py", line 34, 
in _run_code
exec code in run_globals
  File 
"/Users/01004981/local/src/spark/myspark3/python/pyspark/mllib/recommendation.py",
 line 46, in 
class MatrixFactorizationModel(JavaModelWrapper, JavaSaveable, JavaLoader):
  File 
"/Users/01004981/local/src/spark/myspark3/python/pyspark/mllib/recommendation.py",
 line 166, in MatrixFactorizationModel
@since("1.3.1")
  File "/Users/01004981/local/src/spark/myspark3/python/pyspark/__init__.py", 
line 63, in deco
indents = indent_p.findall(f.__doc__)
TypeError: expected string or buffer
{noformat}

  was:
When I tried to add @since to a function which doesn't have doc, @since didn't 
go well. It seems that {{___doc___}} is {{None]} under {{since}} decorator.

{noformat}
Traceback (most recent call last):
  File "/Users/01004981/.pyenv/versions/2.6.8/lib/python2.6/runpy.py", line 
122, in _run_module_as_main
"__main__", fname, loader, pkg_name)
  File "/Users/01004981/.pyenv/versions/2.6.8/lib/python2.6/runpy.py", line 34, 
in _run_code
exec code in run_globals
  File 
"/Users/01004981/local/src/spark/myspark3/python/pyspark/mllib/recommendation.py",
 line 46, in 
class MatrixFactorizationModel(JavaModelWrapper, JavaSaveable, JavaLoader):
  File 
"/Users/01004981/local/src/spark/myspark3/python/pyspark/mllib/recommendation.py",
 line 166, in MatrixFactorizationModel
@since("1.3.1")
  File "/Users/01004981/local/src/spark/myspark3/python/pyspark/__init__.py", 
line 63, in deco
indents = indent_p.findall(f.__doc__)
TypeError: expected string or buffer
{noformat}


> Fix @since when a function doesn't have doc
> ---
>
> Key: SPARK-10512
> URL: https://issues.apache.org/jira/browse/SPARK-10512
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 1.6.0
>Reporter: Yu Ishikawa
>
> When I tried to add @since to a function which doesn't have doc, @since 
> didn't go well. It seems that {{___doc___}} is {{None}} under {{since}} 
> decorator.
> {noformat}
> Traceback (most recent call last):
>   File "/Users/01004981/.pyenv/versions/2.6.8/lib/python2.6/runpy.py", line 
> 122, in _run_module_as_main
> "__main__", fname, loader, pkg_name)
>   File "/Users/01004981/.pyenv/versions/2.6.8/lib/python2.6/runpy.py", line 
> 34, in _run_code
> exec code in run_globals
>   File 
> "/Users/01004981/local/src/spark/myspark3/python/pyspark/mllib/recommendation.py",
>  line 46, in 
> class MatrixFactorizationModel(JavaModelWrapper, JavaSaveable, 
> JavaLoader):
>   File 
> "/Users/01004981/local/src/spark/myspark3/python/pyspark/mllib/recommendation.py",
>  line 166, in MatrixFactorizationModel
> @since("1.3.1")
>   File "/Users/01004981/local/src/spark/myspark3/python/pyspark/__init__.py", 
> line 63, in deco
> indents = indent_p.findall(f.__doc__)
> TypeError: expected string or buffer
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10512) Fix @since when a function doesn't have doc

2015-09-09 Thread Yu Ishikawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Ishikawa updated SPARK-10512:

Description: 
When I tried to add @since to a function which doesn't have doc, @since didn't 
go well. It seems that {{___doc___}} is {{None]} under {{since}} decorator.

{noformat}
Traceback (most recent call last):
  File "/Users/01004981/.pyenv/versions/2.6.8/lib/python2.6/runpy.py", line 
122, in _run_module_as_main
"__main__", fname, loader, pkg_name)
  File "/Users/01004981/.pyenv/versions/2.6.8/lib/python2.6/runpy.py", line 34, 
in _run_code
exec code in run_globals
  File 
"/Users/01004981/local/src/spark/myspark3/python/pyspark/mllib/recommendation.py",
 line 46, in 
class MatrixFactorizationModel(JavaModelWrapper, JavaSaveable, JavaLoader):
  File 
"/Users/01004981/local/src/spark/myspark3/python/pyspark/mllib/recommendation.py",
 line 166, in MatrixFactorizationModel
@since("1.3.1")
  File "/Users/01004981/local/src/spark/myspark3/python/pyspark/__init__.py", 
line 63, in deco
indents = indent_p.findall(f.__doc__)
TypeError: expected string or buffer
{noformat}

  was:
When I tried to add @since to a function which doesn't have doc, @since didn't 
go well. It seems that {{___doc___}} is {{None]} under {{since}} decorator.

```
Traceback (most recent call last):
  File "/Users/01004981/.pyenv/versions/2.6.8/lib/python2.6/runpy.py", line 
122, in _run_module_as_main
"__main__", fname, loader, pkg_name)
  File "/Users/01004981/.pyenv/versions/2.6.8/lib/python2.6/runpy.py", line 34, 
in _run_code
exec code in run_globals
  File 
"/Users/01004981/local/src/spark/myspark3/python/pyspark/mllib/recommendation.py",
 line 46, in 
class MatrixFactorizationModel(JavaModelWrapper, JavaSaveable, JavaLoader):
  File 
"/Users/01004981/local/src/spark/myspark3/python/pyspark/mllib/recommendation.py",
 line 166, in MatrixFactorizationModel
@since("1.3.1")
  File "/Users/01004981/local/src/spark/myspark3/python/pyspark/__init__.py", 
line 63, in deco
indents = indent_p.findall(f.__doc__)
TypeError: expected string or buffer
```


> Fix @since when a function doesn't have doc
> ---
>
> Key: SPARK-10512
> URL: https://issues.apache.org/jira/browse/SPARK-10512
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 1.6.0
>Reporter: Yu Ishikawa
>
> When I tried to add @since to a function which doesn't have doc, @since 
> didn't go well. It seems that {{___doc___}} is {{None]} under {{since}} 
> decorator.
> {noformat}
> Traceback (most recent call last):
>   File "/Users/01004981/.pyenv/versions/2.6.8/lib/python2.6/runpy.py", line 
> 122, in _run_module_as_main
> "__main__", fname, loader, pkg_name)
>   File "/Users/01004981/.pyenv/versions/2.6.8/lib/python2.6/runpy.py", line 
> 34, in _run_code
> exec code in run_globals
>   File 
> "/Users/01004981/local/src/spark/myspark3/python/pyspark/mllib/recommendation.py",
>  line 46, in 
> class MatrixFactorizationModel(JavaModelWrapper, JavaSaveable, 
> JavaLoader):
>   File 
> "/Users/01004981/local/src/spark/myspark3/python/pyspark/mllib/recommendation.py",
>  line 166, in MatrixFactorizationModel
> @since("1.3.1")
>   File "/Users/01004981/local/src/spark/myspark3/python/pyspark/__init__.py", 
> line 63, in deco
> indents = indent_p.findall(f.__doc__)
> TypeError: expected string or buffer
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10512) Fix @since when a function doesn't have doc

2015-09-09 Thread Yu Ishikawa (JIRA)
Yu Ishikawa created SPARK-10512:
---

 Summary: Fix @since when a function doesn't have doc
 Key: SPARK-10512
 URL: https://issues.apache.org/jira/browse/SPARK-10512
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 1.6.0
Reporter: Yu Ishikawa


When I tried to add @since to a function which doesn't have doc, @since didn't 
go well. It seems that {{___doc___}} is {{None]} under {{since}} decorator.

```
Traceback (most recent call last):
  File "/Users/01004981/.pyenv/versions/2.6.8/lib/python2.6/runpy.py", line 
122, in _run_module_as_main
"__main__", fname, loader, pkg_name)
  File "/Users/01004981/.pyenv/versions/2.6.8/lib/python2.6/runpy.py", line 34, 
in _run_code
exec code in run_globals
  File 
"/Users/01004981/local/src/spark/myspark3/python/pyspark/mllib/recommendation.py",
 line 46, in 
class MatrixFactorizationModel(JavaModelWrapper, JavaSaveable, JavaLoader):
  File 
"/Users/01004981/local/src/spark/myspark3/python/pyspark/mllib/recommendation.py",
 line 166, in MatrixFactorizationModel
@since("1.3.1")
  File "/Users/01004981/local/src/spark/myspark3/python/pyspark/__init__.py", 
line 63, in deco
indents = indent_p.findall(f.__doc__)
TypeError: expected string or buffer
```



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10276) Add @since annotation to pyspark.mllib.recommendation

2015-09-09 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14736346#comment-14736346
 ] 

Yu Ishikawa commented on SPARK-10276:
-

[~mengxr] should we add `@since` = to the class methods with `@classmethod` in 
PySpark? When I tried to do that, I got an error as follows. It seems that we 
can't rewrite {{___doc___}} of a `classmethod`.

{noformat}
Traceback (most recent call last):
  File "/Users/01004981/.pyenv/versions/2.6.8/lib/python2.6/runpy.py", line 
122, in _run_module_as_main
"__main__", fname, loader, pkg_name)
  File "/Users/01004981/.pyenv/versions/2.6.8/lib/python2.6/runpy.py", line 34, 
in _run_code
exec code in run_globals
  File 
"/Users/01004981/local/src/spark/myspark3/python/pyspark/mllib/recommendation.py",
 line 46, in 
class MatrixFactorizationModel(JavaModelWrapper, JavaSaveable, JavaLoader):
  File 
"/Users/01004981/local/src/spark/myspark3/python/pyspark/mllib/recommendation.py",
 line 175, in MatrixFactorizationModel
@classmethod
  File "/Users/01004981/local/src/spark/myspark3/python/pyspark/__init__.py", 
line 62, in deco
f.__doc__ = f.__doc__.rstrip() + "\n\n%s.. versionadded:: %s" % (indent, 
version)
AttributeError: 'classmethod' object attribute '__doc__' is read-only
{noformat}

> Add @since annotation to pyspark.mllib.recommendation
> -
>
> Key: SPARK-10276
> URL: https://issues.apache.org/jira/browse/SPARK-10276
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, MLlib, PySpark
>Reporter: Xiangrui Meng
>Priority: Minor
>  Labels: starter
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10512) Fix @since when a function doesn't have doc

2015-09-09 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14736973#comment-14736973
 ] 

Yu Ishikawa commented on SPARK-10512:
-

[~davies] oh, I see. Thank you for letting me know.

> Fix @since when a function doesn't have doc
> ---
>
> Key: SPARK-10512
> URL: https://issues.apache.org/jira/browse/SPARK-10512
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 1.6.0
>Reporter: Yu Ishikawa
>
> When I tried to add @since to a function which doesn't have doc, @since 
> didn't go well. It seems that {{___doc___}} is {{None}} under {{since}} 
> decorator.
> {noformat}
> Traceback (most recent call last):
>   File "/Users/01004981/.pyenv/versions/2.6.8/lib/python2.6/runpy.py", line 
> 122, in _run_module_as_main
> "__main__", fname, loader, pkg_name)
>   File "/Users/01004981/.pyenv/versions/2.6.8/lib/python2.6/runpy.py", line 
> 34, in _run_code
> exec code in run_globals
>   File 
> "/Users/01004981/local/src/spark/myspark3/python/pyspark/mllib/recommendation.py",
>  line 46, in 
> class MatrixFactorizationModel(JavaModelWrapper, JavaSaveable, 
> JavaLoader):
>   File 
> "/Users/01004981/local/src/spark/myspark3/python/pyspark/mllib/recommendation.py",
>  line 166, in MatrixFactorizationModel
> @since("1.3.1")
>   File "/Users/01004981/local/src/spark/myspark3/python/pyspark/__init__.py", 
> line 63, in deco
> indents = indent_p.findall(f.__doc__)
> TypeError: expected string or buffer
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10276) Add @since annotation to pyspark.mllib.recommendation

2015-09-09 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14738002#comment-14738002
 ] 

Yu Ishikawa commented on SPARK-10276:
-

It seems that `@since` depends on an order of decorators. 

{noformat}
# Work
@classmethod
@since("1.4.0")
def foo(cls):

# Not Work
@since("1.4.0")
@classmethod
def bar(cls):
{noformat}

> Add @since annotation to pyspark.mllib.recommendation
> -
>
> Key: SPARK-10276
> URL: https://issues.apache.org/jira/browse/SPARK-10276
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, MLlib, PySpark
>Reporter: Xiangrui Meng
>Priority: Minor
>  Labels: starter
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8467) Add LDAModel.describeTopics() in Python

2015-09-07 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14733729#comment-14733729
 ] 

Yu Ishikawa commented on SPARK-8467:


Sorry for the delay of my reply. I just sent a PR about this issue. Thanks.

> Add LDAModel.describeTopics() in Python
> ---
>
> Key: SPARK-8467
> URL: https://issues.apache.org/jira/browse/SPARK-8467
> Project: Spark
>  Issue Type: New Feature
>  Components: MLlib, PySpark
>Reporter: Yu Ishikawa
>
> Add LDAModel. describeTopics() in Python.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10260) Add @Since annotation to ml.clustering

2015-08-26 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712682#comment-14712682
 ] 

Yu Ishikawa commented on SPARK-10260:
-

I'll work on this issue.

 Add @Since annotation to ml.clustering
 --

 Key: SPARK-10260
 URL: https://issues.apache.org/jira/browse/SPARK-10260
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, ML
Reporter: Xiangrui Meng
Priority: Minor
  Labels: starter





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10214) Improve SparkR Column, DataFrame API docs

2015-08-24 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710528#comment-14710528
 ] 

Yu Ishikawa commented on SPARK-10214:
-

[~shivaram] Apart from this issue, how come aren't RDD functions written as 
roxygen2 documentations? That is, there are no Rd files for them since the 
beginning of lines is {{#}}, not {{#'}}.

https://github.com/apache/spark/blob/master/R/pkg/R/RDD.R#L114

 Improve SparkR Column, DataFrame API docs
 -

 Key: SPARK-10214
 URL: https://issues.apache.org/jira/browse/SPARK-10214
 Project: Spark
  Issue Type: Documentation
  Components: SparkR
Reporter: Shivaram Venkataraman

 Right now the docs for functions like `agg` and `filter` have duplicate 
 entries like `agg-method` and `filter-method` etc. We should use the `name` 
 Rd tag and remove these duplicates.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10214) Improve SparkR Column, DataFrame API docs

2015-08-24 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710578#comment-14710578
 ] 

Yu Ishikawa commented on SPARK-10214:
-

I understand. Thanks!

 Improve SparkR Column, DataFrame API docs
 -

 Key: SPARK-10214
 URL: https://issues.apache.org/jira/browse/SPARK-10214
 Project: Spark
  Issue Type: Documentation
  Components: SparkR
Reporter: Shivaram Venkataraman

 Right now the docs for functions like `agg` and `filter` have duplicate 
 entries like `agg-method` and `filter-method` etc. We should use the `name` 
 Rd tag and remove these duplicates.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10118) Improve SparkR API docs for 1.5 release

2015-08-23 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708645#comment-14708645
 ] 

Yu Ishikawa commented on SPARK-10118:
-

[~shivaram] sure. I'll send a PR about that later.

 Improve SparkR API docs for 1.5 release
 ---

 Key: SPARK-10118
 URL: https://issues.apache.org/jira/browse/SPARK-10118
 Project: Spark
  Issue Type: Documentation
  Components: Documentation, SparkR
Reporter: Shivaram Venkataraman

 This includes checking if the new DataFrame functions  expression show up 
 appropriately in the roxygen docs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-10118) Improve SparkR API docs for 1.5 release

2015-08-21 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707631#comment-14707631
 ] 

Yu Ishikawa edited comment on SPARK-10118 at 8/21/15 11:22 PM:
---

[~shivaram] It seems that the Rd files of dplr and plyr were split by each 
function. And I asked a creater of roxygen2 if we should split functins.Rd, he 
answerd that would improve readability!. And I think {{@family}} would be 
useful. 

h3. Suggestion
- Splits functions.Rd into each functions' Rd file
- Uses {{@family}} to relate the functions on API docs. It would be better to 
follow Scala {{@group}}

For example, {{add_months}} is documented:
{noformat}
# generic.R

#' @rdname add_months
#' @export
setGeneric(add_months, function(y, x) { standardGeneric(add_months) })

# functions.R
#' add_months
#'
#' Returns the date that is numMonths after startDate.
#'
#' @family datetime_funcs
#' @rdname add_months
#' @export
setMethod(add_months, signature(y = Column, x = numeric),
  function(y, x) {
jc - callJStatic(org.apache.spark.sql.functions, add_months, 
y@jc, as.integer(x))
column(jc)
  })
{noformat}


h3. The Rd files of dplr and plyr
- https://github.com/hadley/dplyr/tree/master/man
- https://github.com/hadley/plyr/tree/master/man

h3. Reference of {{@family}}

{quote}
If you have a family of related functions where every function should link to 
every other function in the family, use @family. The value of @family should be 
plural.
{quote}
http://r-pkgs.had.co.nz/man.html



was (Author: yuu.ishik...@gmail.com):
[~shivaram] It seems that the Rd files of dplr and plyr were split by each 
function. And I asked a creater of roxygen2 if we should split functins.Rd, he 
answerd that would improve readability!. And I think {{@family}} would be 
useful. 

h3. Suggestion
- Splits functions.Rd into each functions' Rd file
- Uses {{@family}} to relate the functions on API docs. It would be better to 
follow Scala {{@group}}

For example, {{add_months}} is documented:
{noformat}
# generic.R

#' @rdname add_months
#' @export
setGeneric(add_months, function(y, x) { standardGeneric(add_months) })

# functions.R
#' add_months
#'
#' Returns the date that is numMonths after startDate.
#'
#' @family datetime_funcs
#' @rdname add_months
#' @export
setMethod(add_months, signature(y = Column, x = numeric),
  function(y, x) {
jc - callJStatic(org.apache.spark.sql.functions, add_months, 
y@jc, as.integer(x))
column(jc)
  })
{noformat}


h3. The Rd files of dplr and plyr
- https://github.com/hadley/dplyr/tree/master/R
- https://github.com/hadley/plyr/tree/master/man

h3. Reference of {{@family}}

{quote}
If you have a family of related functions where every function should link to 
every other function in the family, use @family. The value of @family should be 
plural.
{quote}
http://r-pkgs.had.co.nz/man.html


 Improve SparkR API docs for 1.5 release
 ---

 Key: SPARK-10118
 URL: https://issues.apache.org/jira/browse/SPARK-10118
 Project: Spark
  Issue Type: Documentation
  Components: Documentation, SparkR
Reporter: Shivaram Venkataraman

 This includes checking if the new DataFrame functions  expression show up 
 appropriately in the roxygen docs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10118) Improve SparkR API docs for 1.5 release

2015-08-21 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707631#comment-14707631
 ] 

Yu Ishikawa commented on SPARK-10118:
-

[~shivaram] It seems that the Rd files of dplr and plyr were split by each 
function. And I asked a creater of roxygen2 if we should split functins.Rd, he 
answerd that would improve readability!. And I think {{@family}} would be 
useful. 

h3. Suggestion
- Splits functions.Rd into each functions' Rd file
- Uses {{@family}} to relate the functions on API docs. It would be better to 
follow Scala {{@group}}

For example, {{add_months}} is documented:
{noformat}
# generic.R

#' @rdname add_months
#' @export
setGeneric(add_months, function(y, x) { standardGeneric(add_months) })

# functions.R
#' add_months
#'
#' Returns the date that is numMonths after startDate.
#'
#' @family datetime_funcs
#' @rdname add_months
#' @export
setMethod(add_months, signature(y = Column, x = numeric),
  function(y, x) {
jc - callJStatic(org.apache.spark.sql.functions, add_months, 
y@jc, as.integer(x))
column(jc)
  })
{noformat}


h3. The Rd files of dplr and plyr
- https://github.com/hadley/dplyr/tree/master/R
- https://github.com/hadley/plyr/tree/master/man

h3. Reference of {{@family}}

{quote}
If you have a family of related functions where every function should link to 
every other function in the family, use @family. The value of @family should be 
plural.
{quote}
http://r-pkgs.had.co.nz/man.html


 Improve SparkR API docs for 1.5 release
 ---

 Key: SPARK-10118
 URL: https://issues.apache.org/jira/browse/SPARK-10118
 Project: Spark
  Issue Type: Documentation
  Components: Documentation, SparkR
Reporter: Shivaram Venkataraman

 This includes checking if the new DataFrame functions  expression show up 
 appropriately in the roxygen docs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9427) Add expression functions in SparkR

2015-08-19 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702552#comment-14702552
 ] 

Yu Ishikawa commented on SPARK-9427:


Alright.

 Add expression functions in SparkR
 --

 Key: SPARK-9427
 URL: https://issues.apache.org/jira/browse/SPARK-9427
 Project: Spark
  Issue Type: New Feature
  Components: SparkR
Reporter: Yu Ishikawa

 The list of functions to add is based on SQL's functions. And it would be 
 better to add them in one shot PR.
 https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-6813) SparkR style guide

2015-08-19 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-6813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702556#comment-14702556
 ] 

Yu Ishikawa commented on SPARK-6813:


We did it. I appreciate your support.

 SparkR style guide
 --

 Key: SPARK-6813
 URL: https://issues.apache.org/jira/browse/SPARK-6813
 Project: Spark
  Issue Type: New Feature
  Components: SparkR
Reporter: Shivaram Venkataraman
Assignee: Yu Ishikawa
 Fix For: 1.5.0


 We should develop a SparkR style guide document based on the some of the 
 guidelines we use and some of the best practices in R.
 Some examples of R style guide are:
 http://r-pkgs.had.co.nz/r.html#style 
 http://google-styleguide.googlecode.com/svn/trunk/google-r-style.html
 A related issue is to work on a automatic style checking tool. 
 https://github.com/jimhester/lintr seems promising
 We could have a R style guide based on the one from google [1], and adjust 
 some of them with the conversation in Spark:
 1. Line Length: maximum 100 characters
 2. no limit on function name (API should be similar as in other languages)
 3. Allow S4 objects/methods



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9427) Add expression functions in SparkR

2015-08-19 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702563#comment-14702563
 ] 

Yu Ishikawa commented on SPARK-9427:


I see. Thank you for letting me know. 

 Add expression functions in SparkR
 --

 Key: SPARK-9427
 URL: https://issues.apache.org/jira/browse/SPARK-9427
 Project: Spark
  Issue Type: New Feature
  Components: SparkR
Reporter: Yu Ishikawa

 The list of functions to add is based on SQL's functions. And it would be 
 better to add them in one shot PR.
 https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10106) Add `ifelse` Column function to SparkR

2015-08-18 Thread Yu Ishikawa (JIRA)
Yu Ishikawa created SPARK-10106:
---

 Summary: Add `ifelse` Column function to SparkR
 Key: SPARK-10106
 URL: https://issues.apache.org/jira/browse/SPARK-10106
 Project: Spark
  Issue Type: New Feature
  Components: SparkR
Reporter: Yu Ishikawa


Add a column function on a DataFrame like `ifelse` in R to SparkR.
I guess we could implement it with a combination with {{when}} and 
{{otherwise}}.

h3. Example

If {{df$x  0}} is TRUE, then return 0, else return 1.
{noformat}
ifelse(df$x  0, 0, 1)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-10106) Add `ifelse` Column function to SparkR

2015-08-18 Thread Yu Ishikawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-10106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Ishikawa updated SPARK-10106:

Description: 
Add a column function on a DataFrame like `ifelse` in R to SparkR.
I guess we could implement it with a combination with {{when}} and 
{{otherwise}}.

h3. Example

If {{df$x  0}} is TRUE, then return 0, otherwise return 1.
{noformat}
ifelse(df$x  0, 0, 1)
{noformat}

  was:
Add a column function on a DataFrame like `ifelse` in R to SparkR.
I guess we could implement it with a combination with {{when}} and 
{{otherwise}}.

h3. Example

If {{df$x  0}} is TRUE, then return 0, else return 1.
{noformat}
ifelse(df$x  0, 0, 1)
{noformat}


 Add `ifelse` Column function to SparkR
 --

 Key: SPARK-10106
 URL: https://issues.apache.org/jira/browse/SPARK-10106
 Project: Spark
  Issue Type: New Feature
  Components: SparkR
Reporter: Yu Ishikawa

 Add a column function on a DataFrame like `ifelse` in R to SparkR.
 I guess we could implement it with a combination with {{when}} and 
 {{otherwise}}.
 h3. Example
 If {{df$x  0}} is TRUE, then return 0, otherwise return 1.
 {noformat}
 ifelse(df$x  0, 0, 1)
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10079) Make `column` and `col` functions be S4 functions

2015-08-17 Thread Yu Ishikawa (JIRA)
Yu Ishikawa created SPARK-10079:
---

 Summary: Make `column` and `col` functions be S4 functions
 Key: SPARK-10079
 URL: https://issues.apache.org/jira/browse/SPARK-10079
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Yu Ishikawa


{{column}} and {{col}} function at {{R/pkg/R/Column.R}} are currently defined 
as S3 functions. I think it would be better to define them as S4 functions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9972) Add `struct`, `encode` and `decode` function in SparkR

2015-08-17 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699237#comment-14699237
 ] 

Yu Ishikawa commented on SPARK-9972:


This is a quick note to tell the reason. When I tried to implement 
{{sort_array}}, I got the error as follows. I haven't inspected it, but the 
cause seems to be at {{collect}}. I'll comment about that in detail later.

{noformat}
1. Error: sort_array on a DataFrame 
cannot coerce class jobj to a data.frame
1: withCallingHandlers(eval(code, new_test_environment), error = capture_calls, 
message = function(c) invokeRestart(muffleMessage),
   warning = function(c) invokeRestart(muffleWarning))
2: eval(code, new_test_environment)
3: eval(expr, envir, enclos)
4: expect_equal(collect(select(df, sort_array(df$a)))[1, 1], c(1, 2, 3)) at 
test_sparkSQL.R:787
5: expect_that(object, equals(expected, label = expected.label, ...), info = 
info, label = label)
6: condition(object)
7: compare(expected, actual, ...)
8: compare.numeric(expected, actual, ...)
9: all.equal(x, y, ...)
10: all.equal.numeric(x, y, ...)
11: attr.all.equal(target, current, tolerance = tolerance, scale = scale, ...)
12: mode(current)
13: collect(select(df, sort_array(df$a)))
14: collect(select(df, sort_array(df$a)))
15: .local(x, ...)
16: do.call(cbind.data.frame, list(cols, stringsAsFactors = stringsAsFactors))
17: (function (..., deparse.level = 1)
   data.frame(..., check.names = FALSE))(structure(list(`sort_array(a,true)` = 
list(
   environment, NA, NA)), .Names = sort_array(a,true)), 
stringsAsFactors = FALSE)
18: data.frame(..., check.names = FALSE)
19: as.data.frame(x[[i]], optional = TRUE, stringsAsFactors = stringsAsFactors)
20: as.data.frame.list(x[[i]], optional = TRUE, stringsAsFactors = 
stringsAsFactors)
21: eval(as.call(c(expression(data.frame), x, check.names = !optional, 
stringsAsFactors = stringsAsFactors)))
22: eval(expr, envir, enclos)
23: data.frame(`sort_array(a,true)` = list(environment, NA, NA), check.names 
= FALSE,
   stringsAsFactors = FALSE)
24: as.data.frame(x[[i]], optional = TRUE, stringsAsFactors = stringsAsFactors)
25: as.data.frame.list(x[[i]], optional = TRUE, stringsAsFactors = 
stringsAsFactors)
26: eval(as.call(c(expression(data.frame), x, check.names = !optional, 
stringsAsFactors = stringsAsFactors)))
27: eval(expr, envir, enclos)
28: data.frame(environment, NA, NA, check.names = FALSE, stringsAsFactors = 
FALSE)
29: as.data.frame(x[[i]], optional = TRUE)
30: as.data.frame.default(x[[i]], optional = TRUE)
31: stop(gettextf(cannot coerce class \%s\ to a data.frame, 
deparse(class(x))), domain = NA)
32: .handleSimpleError(function (e)
   {
   e$calls - head(sys.calls()[-seq_len(frame + 7)], -2)
   signalCondition(e)
   }, cannot coerce class \\jobj\\ to a data.frame, 
quote(as.data.frame.default(x[[i]],
   optional = TRUE)))
{noformat}

 Add `struct`, `encode` and `decode` function in SparkR
 --

 Key: SPARK-9972
 URL: https://issues.apache.org/jira/browse/SPARK-9972
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Yu Ishikawa

 Support {{struct}} function on a DataFrame in SparkR. However, I think we 
 need to improve {{collect}} function in SparkR in order to implement 
 {{struct}} function.
 - struct
 - encode
 - decode
 - array_contains
 - sort_array



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10075) Add `when` expressino function in SparkR

2015-08-17 Thread Yu Ishikawa (JIRA)
Yu Ishikawa created SPARK-10075:
---

 Summary: Add `when` expressino function in SparkR
 Key: SPARK-10075
 URL: https://issues.apache.org/jira/browse/SPARK-10075
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Yu Ishikawa


Add {{when}} function into SparkR. Before this issue, we need to implement 
{{when}}, {{otherwise}} and so on as {{Column}} methods.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10079) Make `column` and `col` functions be S4 functions

2015-08-17 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700721#comment-14700721
 ] 

Yu Ishikawa commented on SPARK-10079:
-

I see. I misunderstood them in SparkR since {{column}} and {{col}} in Scala are 
{{normal_funcs}}. If we should keep them private, I'll close this issue after 
[~davies]'s comment. Thanks!

 Make `column` and `col` functions be S4 functions
 -

 Key: SPARK-10079
 URL: https://issues.apache.org/jira/browse/SPARK-10079
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Yu Ishikawa

 {{column}} and {{col}} function at {{R/pkg/R/Column.R}} are currently defined 
 as S3 functions. I think it would be better to define them as S4 functions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9856) Add expression functions into SparkR whose params are complicated

2015-08-17 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699124#comment-14699124
 ] 

Yu Ishikawa commented on SPARK-9856:


[~sunrui] Sorry, I'm working on this issue. I'll send a PR about this issue 
soon.

 Add expression functions into SparkR whose params are complicated
 -

 Key: SPARK-9856
 URL: https://issues.apache.org/jira/browse/SPARK-9856
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Yu Ishikawa

 Add expression functions whose parameters are a little complicated, like 
 {{regexp_extract(e: Column, exp: String, groupIdx: Int)}} and 
 {{regexp_replace(e: Column, pattern: String, replacement: String)}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10043) Add window functions into SparkR

2015-08-17 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699058#comment-14699058
 ] 

Yu Ishikawa commented on SPARK-10043:
-

[~shivaram] At least, {{lead}} function doesn't work. I haven't checked all of 
them yet. I'' check them and get back to you.

 Add window functions into SparkR
 

 Key: SPARK-10043
 URL: https://issues.apache.org/jira/browse/SPARK-10043
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Yu Ishikawa

 Add window functions as follows in SparkR. I think we should improve 
 {{collect}} function in SparkR.
 - lead
 - cumuDist
 - denseRank
 - lag
 - ntile
 - percentRank
 - rank
 - rowNumber



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9427) Add expression functions in SparkR

2015-08-17 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699072#comment-14699072
 ] 

Yu Ishikawa commented on SPARK-9427:


[~shivaram] and [~davies]

How do we convert R {{integer}} type to Scala {{Long}} type?
I have trouble with implementing {{rand(seed: Long)}} function in SparkR. R 
{{integer}} type is recognized as Scala {{Int}} and R {{numeric}} type is 
recognized as Scala {{Double}} type. So I wonder how I should deal with 64 bit 
integer on R. I think we should add {{rand(seed: Int)}} into spark.sql on 
Scala. What do you think?

Plus, I guess PySpark {{rand}} doesn't work on Python 2.x on the same reason. 
Because {{int}} on Python 2.x is recognized as Scala {{Integer}} type.

 Add expression functions in SparkR
 --

 Key: SPARK-9427
 URL: https://issues.apache.org/jira/browse/SPARK-9427
 Project: Spark
  Issue Type: New Feature
  Components: SparkR
Reporter: Yu Ishikawa

 The list of functions to add is based on SQL's functions. And it would be 
 better to add them in one shot PR.
 https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10043) Add window functions into SparkR

2015-08-16 Thread Yu Ishikawa (JIRA)
Yu Ishikawa created SPARK-10043:
---

 Summary: Add window functions into SparkR
 Key: SPARK-10043
 URL: https://issues.apache.org/jira/browse/SPARK-10043
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Yu Ishikawa


Add window functions as follows in SparkR. I think we should improve 
{{collect}} function in SparkR.

- lead
- cumuDist
- denseRank
- lag
- ntile
- percentRank
- rank
- rowNumber



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-10043) Add window functions into SparkR

2015-08-16 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-10043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698994#comment-14698994
 ] 

Yu Ishikawa commented on SPARK-10043:
-

As far as I know, there is no unit testing of the window function in Scala / 
Python. At least, we should add unit testing in Python.

 Add window functions into SparkR
 

 Key: SPARK-10043
 URL: https://issues.apache.org/jira/browse/SPARK-10043
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Yu Ishikawa

 Add window functions as follows in SparkR. I think we should improve 
 {{collect}} function in SparkR.
 - lead
 - cumuDist
 - denseRank
 - lag
 - ntile
 - percentRank
 - rank
 - rowNumber



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9871) Add expression functions into SparkR which have a variable parameter

2015-08-14 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696706#comment-14696706
 ] 

Yu Ishikawa commented on SPARK-9871:


I think it would be better to deal with {{struct}} on another issue. Since 
{{collect}} doesn't work with a DataFrame which has a column of Struct type. So 
we need to improve {{collect}} method.

When I tried to implement {{struct}} function, a strict column converted by 
{{dfToCols}} consists of {{jobj}} 
{noformat}
List of 1
 $ structed:List of 2
  ..$ :Class 'jobj' environment: 0x7fd46efe4e68
  ..$ :Class 'jobj' environment: 0x7fd46efee078
{noformat}

 Add expression functions into SparkR which have a variable parameter
 

 Key: SPARK-9871
 URL: https://issues.apache.org/jira/browse/SPARK-9871
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Yu Ishikawa

 Add expression functions into SparkR which has a variable parameter, like 
 {{concat}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-9972) Add `struct` function in SparkR

2015-08-14 Thread Yu Ishikawa (JIRA)
Yu Ishikawa created SPARK-9972:
--

 Summary: Add `struct` function in SparkR
 Key: SPARK-9972
 URL: https://issues.apache.org/jira/browse/SPARK-9972
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Yu Ishikawa


Support {{struct}} function on a DataFrame in SparkR. However, I think we need 
to improve {{collect}} function in SparkR in order to implement {{struct}} 
function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9972) Add `struct` function in SparkR

2015-08-14 Thread Yu Ishikawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Ishikawa updated SPARK-9972:
---
Target Version/s: 1.6.0  (was: 1.5.0)

 Add `struct` function in SparkR
 ---

 Key: SPARK-9972
 URL: https://issues.apache.org/jira/browse/SPARK-9972
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Yu Ishikawa

 Support {{struct}} function on a DataFrame in SparkR. However, I think we 
 need to improve {{collect}} function in SparkR in order to implement 
 {{struct}} function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9972) Add `struct`, `encode` and `decode` function in SparkR

2015-08-14 Thread Yu Ishikawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Ishikawa updated SPARK-9972:
---
Summary: Add `struct`, `encode` and `decode` function in SparkR  (was: Add 
`struct` function in SparkR)

 Add `struct`, `encode` and `decode` function in SparkR
 --

 Key: SPARK-9972
 URL: https://issues.apache.org/jira/browse/SPARK-9972
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Yu Ishikawa

 Support {{struct}} function on a DataFrame in SparkR. However, I think we 
 need to improve {{collect}} function in SparkR in order to implement 
 {{struct}} function.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9972) Add `struct`, `encode` and `decode` function in SparkR

2015-08-14 Thread Yu Ishikawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Ishikawa updated SPARK-9972:
---
Description: 
Support {{struct}} function on a DataFrame in SparkR. However, I think we need 
to improve {{collect}} function in SparkR in order to implement {{struct}} 
function.

- struct
- encode
- decode
- array_contains


  was:Support {{struct}} function on a DataFrame in SparkR. However, I think we 
need to improve {{collect}} function in SparkR in order to implement {{struct}} 
function.


 Add `struct`, `encode` and `decode` function in SparkR
 --

 Key: SPARK-9972
 URL: https://issues.apache.org/jira/browse/SPARK-9972
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Yu Ishikawa

 Support {{struct}} function on a DataFrame in SparkR. However, I think we 
 need to improve {{collect}} function in SparkR in order to implement 
 {{struct}} function.
 - struct
 - encode
 - decode
 - array_contains



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9972) Add `struct`, `encode` and `decode` function in SparkR

2015-08-14 Thread Yu Ishikawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Ishikawa updated SPARK-9972:
---
Description: 
Support {{struct}} function on a DataFrame in SparkR. However, I think we need 
to improve {{collect}} function in SparkR in order to implement {{struct}} 
function.

- struct
- encode
- decode
- array_contains
- sort_array


  was:
Support {{struct}} function on a DataFrame in SparkR. However, I think we need 
to improve {{collect}} function in SparkR in order to implement {{struct}} 
function.

- struct
- encode
- decode
- array_contains



 Add `struct`, `encode` and `decode` function in SparkR
 --

 Key: SPARK-9972
 URL: https://issues.apache.org/jira/browse/SPARK-9972
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Yu Ishikawa

 Support {{struct}} function on a DataFrame in SparkR. However, I think we 
 need to improve {{collect}} function in SparkR in order to implement 
 {{struct}} function.
 - struct
 - encode
 - decode
 - array_contains
 - sort_array



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-10007) Update `NAMESPACE` file in SparkR for simple parameters functions

2015-08-14 Thread Yu Ishikawa (JIRA)
Yu Ishikawa created SPARK-10007:
---

 Summary: Update `NAMESPACE` file in SparkR for simple parameters 
functions
 Key: SPARK-10007
 URL: https://issues.apache.org/jira/browse/SPARK-10007
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Yu Ishikawa


I appreciate that I forgot to update {{NAMESPACE}} file for the simple 
parameters functions, such as {{ascii}}, {{base64}} and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8240) string function: concat

2015-08-13 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696456#comment-14696456
 ] 

Yu Ishikawa commented on SPARK-8240:


[~rxin] I think it would be more natural to support arguments which consists of 
mixed {{Column}} and {{String}}. What do you think?

h4. Example.
{noformat}
concat(colA,  , , colB, , , colC)
{noformat}

 string function: concat
 ---

 Key: SPARK-8240
 URL: https://issues.apache.org/jira/browse/SPARK-8240
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin
 Fix For: 1.5.0


 concat(string|binary A, string|binary B...): string / binary
 Returns the string or bytes resulting from concatenating the strings or bytes 
 passed in as parameters in order. For example, concat('foo', 'bar') results 
 in 'foobar'. Note that this function can take any number of input strings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8240) string function: concat

2015-08-13 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696477#comment-14696477
 ] 

Yu Ishikawa commented on SPARK-8240:


I think so. That is probably hard. However, from the user's point of view, it 
is an easy way to define it without {{lit}}.

 string function: concat
 ---

 Key: SPARK-8240
 URL: https://issues.apache.org/jira/browse/SPARK-8240
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Reporter: Reynold Xin
Assignee: Reynold Xin
 Fix For: 1.5.0


 concat(string|binary A, string|binary B...): string / binary
 Returns the string or bytes resulting from concatenating the strings or bytes 
 passed in as parameters in order. For example, concat('foo', 'bar') results 
 in 'foobar'. Note that this function can take any number of input strings.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9871) Add expression functions into SparkR which have a variable parameter

2015-08-12 Thread Yu Ishikawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Ishikawa updated SPARK-9871:
---
Summary: Add expression functions into SparkR which have a variable 
parameter  (was: Add expression functions into SparkR which has a variable 
parameter)

 Add expression functions into SparkR which have a variable parameter
 

 Key: SPARK-9871
 URL: https://issues.apache.org/jira/browse/SPARK-9871
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Yu Ishikawa

 Add expression functions into SparkR which has a variable parameter, like 
 {{concat}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-9871) Add expression functions into SparkR which has a variable parameter

2015-08-12 Thread Yu Ishikawa (JIRA)
Yu Ishikawa created SPARK-9871:
--

 Summary: Add expression functions into SparkR which has a variable 
parameter
 Key: SPARK-9871
 URL: https://issues.apache.org/jira/browse/SPARK-9871
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Yu Ishikawa


Add expression functions into SparkR which has a variable parameter, like 
{{concat}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9427) Add expression functions in SparkR

2015-08-11 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692600#comment-14692600
 ] 

Yu Ishikawa commented on SPARK-9427:


[~shivaram] After all, I'd like to split this issue to a few sub-issues. Since 
it is quite difficult to add the listed expressions at once. And since it is a 
little hard to review a PR for this issue. I think we could classify them to at 
least three types in SparkR. What do you think?

1. Add expressions whose parameter are only {{(Column)}} or {{(Column, 
Column)}}, like {{md5(e: Column)}}
2. Add expressions whose parameter are a little complicated, like {{conv(num: 
Column, fromBase: Int, toBase: Int)}}
3. Add expressions which are conflicted with the already existing generic, like 
{{coalesce(e: Column*)}}

{{1}} is not a difficult task, extracting method definitions from Scala code. 
And I think we rarely need to consider the confliction with current SparkR code.
However, {{2}} and {{3}} are a little hard because of the complexityomplexity. 
For example, in {{3}}, if we must modify the existing R's generic due to new 
expressions, we should check whether the modification affects the existing code 
or not.

 Add expression functions in SparkR
 --

 Key: SPARK-9427
 URL: https://issues.apache.org/jira/browse/SPARK-9427
 Project: Spark
  Issue Type: New Feature
  Components: SparkR
Reporter: Yu Ishikawa

 The list of functions to add is based on SQL's functions. And it would be 
 better to add them in one shot PR.
 https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-9855) Add expression functions into SparkR whose params are simple

2015-08-11 Thread Yu Ishikawa (JIRA)
Yu Ishikawa created SPARK-9855:
--

 Summary: Add expression functions into SparkR whose params are 
simple
 Key: SPARK-9855
 URL: https://issues.apache.org/jira/browse/SPARK-9855
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Yu Ishikawa


Add expression functions whose parameters are only {{(Column)}} or {{(Column, 
Column)}}, like {{md5}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-9857) Add expression functions into SparkR which conflict with the existing R's generic

2015-08-11 Thread Yu Ishikawa (JIRA)
Yu Ishikawa created SPARK-9857:
--

 Summary: Add expression functions into SparkR which conflict with 
the existing R's generic
 Key: SPARK-9857
 URL: https://issues.apache.org/jira/browse/SPARK-9857
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Yu Ishikawa


Add expression functions into SparkR which conflict with the existing R's 
generic, like {{coalesce(e: Column*)}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9427) Add expression functions in SparkR

2015-08-11 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692777#comment-14692777
 ] 

Yu Ishikawa commented on SPARK-9427:


[~shivaram] I don't figure out the number of each type. However, I estimated 
them as folows. Please be careful that it includes the functions which have 
been added into SparkR.

1 = 50
2 and 3 = 51

 Add expression functions in SparkR
 --

 Key: SPARK-9427
 URL: https://issues.apache.org/jira/browse/SPARK-9427
 Project: Spark
  Issue Type: New Feature
  Components: SparkR
Reporter: Yu Ishikawa

 The list of functions to add is based on SQL's functions. And it would be 
 better to add them in one shot PR.
 https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-9427) Add expression functions in SparkR

2015-08-11 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692777#comment-14692777
 ] 

Yu Ishikawa edited comment on SPARK-9427 at 8/12/15 2:02 AM:
-

[~shivaram] I don't figure out the number of each type. However, I estimated 
them as follows. Please be careful that these include the functions which have 
been added into SparkR.

1 = 50
2 and 3 = 51


was (Author: yuu.ishik...@gmail.com):
[~shivaram] I don't figure out the number of each type. However, I estimated 
them as folows. Please be careful that it includes the functions which have 
been added into SparkR.

1 = 50
2 and 3 = 51

 Add expression functions in SparkR
 --

 Key: SPARK-9427
 URL: https://issues.apache.org/jira/browse/SPARK-9427
 Project: Spark
  Issue Type: New Feature
  Components: SparkR
Reporter: Yu Ishikawa

 The list of functions to add is based on SQL's functions. And it would be 
 better to add them in one shot PR.
 https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9427) Add expression functions in SparkR

2015-08-11 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692607#comment-14692607
 ] 

Yu Ishikawa commented on SPARK-9427:


h3. Memo

These are the expressions which we should add. (Including existing expressions)
I extracted them from Scala's {{functions.scala}} with {{grep}}.

{noformat}
def abs(e: Column): Column
def acos(columnName: String): Column
def acos(e: Column): Column
def add_months(startDate: Column, numMonths: Int): Column
def approxCountDistinct(columnName: String): Column
def approxCountDistinct(columnName: String, rsd: Double): Column
def approxCountDistinct(e: Column): Column
def approxCountDistinct(e: Column, rsd: Double): Column
def array(colName: String, colNames: String*): Column
def array(cols: Column*): Column
def array_contains(column: Column, value: Any): Column
def asc(columnName: String): Column
def ascii(e: Column): Column
def asin(columnName: String): Column
def asin(e: Column): Column
def atan(columnName: String): Column
def atan(e: Column): Column
def atan2(l: Column, r: Column): Column
def atan2(l: Column, r: Double): Column
def atan2(l: Column, rightName: String): Column
def atan2(l: Double, r: Column): Column
def atan2(l: Double, rightName: String): Column
def atan2(leftName: String, r: Column): Column
def atan2(leftName: String, r: Double): Column
def atan2(leftName: String, rightName: String): Column
def avg(columnName: String): Column
def avg(e: Column): Column
def base64(e: Column): Column
def bin(columnName: String): Column
def bin(e: Column): Column
def bitwiseNOT(e: Column): Column
def cbrt(columnName: String): Column
def cbrt(e: Column): Column
def ceil(columnName: String): Column
def ceil(e: Column): Column
def coalesce(e: Column*): Column
def concat(exprs: Column*): Column
def concat_ws(sep: String, exprs: Column*): Column
def conv(num: Column, fromBase: Int, toBase: Int): Column
def cos(columnName: String): Column
def cos(e: Column): Column
def cosh(columnName: String): Column
def cosh(e: Column): Column
def count(columnName: String): Column
def count(e: Column): Column
def countDistinct(columnName: String, columnNames: String*): Column
def countDistinct(expr: Column, exprs: Column*): Column
def crc32(e: Column): Column
def cumeDist(): Column
def current_date(): Column
def current_timestamp(): Column
def date_add(start: Column, days: Int): Column
def date_format(dateExpr: Column, format: String): Column
def date_sub(start: Column, days: Int): Column
def datediff(end: Column, start: Column): Column
def dayofmonth(e: Column): Column
def dayofyear(e: Column): Column
def decode(value: Column, charset: String): Column
def denseRank(): Column
def desc(columnName: String): Column
def encode(value: Column, charset: String): Column
def exp(columnName: String): Column
def exp(e: Column): Column
def explode(e: Column): Column
def expm1(columnName: String): Column
def expm1(e: Column): Column
def expr(expr: String): Column
def factorial(e: Column): Column
def first(columnName: String): Column
def first(e: Column): Column
def floor(columnName: String): Column
def floor(e: Column): Column
def format_number(x: Column, d: Int): Column
def format_string(format: String, arguments: Column*): Column
def from_unixtime(ut: Column): Column
def from_unixtime(ut: Column, f: String): Column
def from_utc_timestamp(ts: Column, tz: String): Column
def greatest(columnName: String, columnNames: String*): Column
def greatest(exprs: Column*): Column
def hex(column: Column): Column
def hour(e: Column): Column
def hypot(l: Column, r: Column): Column
def hypot(l: Column, r: Double): Column
def hypot(l: Column, rightName: String): Column
def hypot(l: Double, r: Column): Column
def hypot(l: Double, rightName: String): Column
def hypot(leftName: String, r: Column): Column
def hypot(leftName: String, r: Double): Column
def hypot(leftName: String, rightName: String): Column
def initcap(e: Column): Column
def inputFileName(): Column
def instr(str: Column, substring: String): Column
def isNaN(e: Column): Column
def lag(columnName: String, offset: Int): Column
def lag(columnName: String, offset: Int, defaultValue: Any): Column
def lag(e: Column, offset: Int): Column
def lag(e: Column, offset: Int, defaultValue: Any): Column
def last(columnName: String): Column
def last(e: Column): Column
def last_day(e: Column): Column
def lead(columnName: String, offset: Int): Column
def lead(columnName: String, offset: Int, defaultValue: Any): Column
def lead(e: Column, offset: Int): Column
def lead(e: Column, offset: Int, defaultValue: Any): Column
def least(columnName: String, columnNames: String*): Column
def least(exprs: Column*): Column
def length(e: Column): Column
def levenshtein(l: Column, r: Column): Column
def lit(literal: Any): Column
def locate(substr: String, str: Column): Column
def locate(substr: String, str: Column, pos: Int): Column
def log(base: Double, a: Column): Column
def 

[jira] [Created] (SPARK-9856) Add expression functions into SparkR whose params are complicated

2015-08-11 Thread Yu Ishikawa (JIRA)
Yu Ishikawa created SPARK-9856:
--

 Summary: Add expression functions into SparkR whose params are 
complicated
 Key: SPARK-9856
 URL: https://issues.apache.org/jira/browse/SPARK-9856
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Yu Ishikawa


Add expression functions whose parameters are a little complicated, like 
{{regexp_extract(e: Column, exp: String, groupIdx: Int)}} and 
{{regexp_replace(e: Column, pattern: String, replacement: String)}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8505) Add settings to kick `lint-r` from `./dev/run-test.py`

2015-08-01 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650233#comment-14650233
 ] 

Yu Ishikawa commented on SPARK-8505:


[~srowen]  Yes, I acknowledge how it is assigned, but I thought it would be 
better to show my activity to the other developers. I would be careful next 
time. Thanks!

 Add settings to kick `lint-r` from `./dev/run-test.py`
 --

 Key: SPARK-8505
 URL: https://issues.apache.org/jira/browse/SPARK-8505
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Yu Ishikawa

 Add some settings to kick `lint-r` script from `./dev/run-test.py`



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8505) Add settings to kick `lint-r` from `./dev/run-test.py`

2015-07-31 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649839#comment-14649839
 ] 

Yu Ishikawa commented on SPARK-8505:


Please assign this issue to me?

 Add settings to kick `lint-r` from `./dev/run-test.py`
 --

 Key: SPARK-8505
 URL: https://issues.apache.org/jira/browse/SPARK-8505
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Yu Ishikawa

 Add some settings to kick `lint-r` script from `./dev/run-test.py`



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-9427) Add expression functions in SparkR

2015-07-29 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645464#comment-14645464
 ] 

Yu Ishikawa edited comment on SPARK-9427 at 7/29/15 11:33 PM:
--

I'll work on this issue.


was (Author: yuu.ishik...@gmail.com):
I'll work this issue.

 Add expression functions in SparkR
 --

 Key: SPARK-9427
 URL: https://issues.apache.org/jira/browse/SPARK-9427
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Yu Ishikawa

 The list of functions to add is based on SQL's functions. And it would be 
 better to add them in one shot PR.
 https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-8917) Add @since tags to mllib.linalg

2015-07-28 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-8917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645248#comment-14645248
 ] 

Yu Ishikawa commented on SPARK-8917:


Please assign this issue to me.

 Add @since tags to mllib.linalg
 ---

 Key: SPARK-8917
 URL: https://issues.apache.org/jira/browse/SPARK-8917
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, MLlib
Reporter: Xiangrui Meng
Priority: Minor
  Labels: starter
   Original Estimate: 2h
  Remaining Estimate: 2h





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-9427) Add expression functions in SparkR

2015-07-28 Thread Yu Ishikawa (JIRA)
Yu Ishikawa created SPARK-9427:
--

 Summary: Add expression functions in SparkR
 Key: SPARK-9427
 URL: https://issues.apache.org/jira/browse/SPARK-9427
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Yu Ishikawa


The list of functions to add is based on SQL's functions. And it would be 
better to add them in one shot PR.
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9427) Add expression functions in SparkR

2015-07-28 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645464#comment-14645464
 ] 

Yu Ishikawa commented on SPARK-9427:


I'll work this issue.

 Add expression functions in SparkR
 --

 Key: SPARK-9427
 URL: https://issues.apache.org/jira/browse/SPARK-9427
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Yu Ishikawa

 The list of functions to add is based on SQL's functions. And it would be 
 better to add them in one shot PR.
 https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9248) Closing curly-braces should always be on their own line

2015-07-24 Thread Yu Ishikawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Ishikawa updated SPARK-9248:
---
Description: 
Closing curly-braces should always be on their own line

For example,
{noformat}
inst/tests/test_sparkSQL.R:606:3: style: Closing curly-braces should always be 
on their own line, unless it's followed by an else.
  }, error = function(err) {
  ^
{noformat}

  was:Closing curly-braces should always be on their own line


 Closing curly-braces should always be on their own line
 ---

 Key: SPARK-9248
 URL: https://issues.apache.org/jira/browse/SPARK-9248
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Yu Ishikawa
Priority: Minor

 Closing curly-braces should always be on their own line
 For example,
 {noformat}
 inst/tests/test_sparkSQL.R:606:3: style: Closing curly-braces should always 
 be on their own line, unless it's followed by an else.
   }, error = function(err) {
   ^
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9248) Closing curly-braces should always be on their own line

2015-07-24 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14640074#comment-14640074
 ] 

Yu Ishikawa commented on SPARK-9248:


Yeah - sorry for not explaining enough. {{dev/lint-r}} doesn't catch the 
warnings about {{\} else \{}} now. There are a few warnings like above.

 Closing curly-braces should always be on their own line
 ---

 Key: SPARK-9248
 URL: https://issues.apache.org/jira/browse/SPARK-9248
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Yu Ishikawa
Priority: Minor

 Closing curly-braces should always be on their own line
 For example,
 {noformat}
 inst/tests/test_sparkSQL.R:606:3: style: Closing curly-braces should always 
 be on their own line, unless it's followed by an else.
   }, error = function(err) {
   ^
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-9249) local variable assigned but may not be used

2015-07-23 Thread Yu Ishikawa (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-9249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yu Ishikawa updated SPARK-9249:
---
Description: 
local variable assigned but may not be used

For example:

{noformat}
R/deserialize.R:105:3: warning: local variable ‘data’ assigned but may not be 
used
  data - readBin(con, raw(), as.integer(dataLen), endian = big)
  ^~~~
R/deserialize.R:109:3: warning: local variable ‘data’ assigned but may not be 
used
  data - readBin(con, raw(), as.integer(dataLen), endian = big)
  ^~~~
{noformat}

  was:local variable assigned but may not be used


 local variable assigned but may not be used
 ---

 Key: SPARK-9249
 URL: https://issues.apache.org/jira/browse/SPARK-9249
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Yu Ishikawa
Priority: Minor

 local variable assigned but may not be used
 For example:
 {noformat}
 R/deserialize.R:105:3: warning: local variable ‘data’ assigned but may not be 
 used
   data - readBin(con, raw(), as.integer(dataLen), endian = big)
   ^~~~
 R/deserialize.R:109:3: warning: local variable ‘data’ assigned but may not be 
 used
   data - readBin(con, raw(), as.integer(dataLen), endian = big)
   ^~~~
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-9249) local variable assigned but may not be used

2015-07-23 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639945#comment-14639945
 ] 

Yu Ishikawa edited comment on SPARK-9249 at 7/24/15 5:22 AM:
-

[~chanchal.spark] Yes. I think we should remove local variables which are not 
used, such as below.
https://github.com/apache/spark/blob/branch-1.4/R/pkg/R/deserialize.R#L104


was (Author: yuu.ishik...@gmail.com):
[~chanchal.spark] Yes. I think we should remove local variables which is not 
used, such as below.
https://github.com/apache/spark/blob/branch-1.4/R/pkg/R/deserialize.R#L104

 local variable assigned but may not be used
 ---

 Key: SPARK-9249
 URL: https://issues.apache.org/jira/browse/SPARK-9249
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Yu Ishikawa
Priority: Minor

 local variable assigned but may not be used
 For example:
 {noformat}
 R/deserialize.R:105:3: warning: local variable ‘data’ assigned but may not be 
 used
   data - readBin(con, raw(), as.integer(dataLen), endian = big)
   ^~~~
 R/deserialize.R:109:3: warning: local variable ‘data’ assigned but may not be 
 used
   data - readBin(con, raw(), as.integer(dataLen), endian = big)
   ^~~~
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9249) local variable assigned but may not be used

2015-07-23 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639945#comment-14639945
 ] 

Yu Ishikawa commented on SPARK-9249:


[~chanchal.spark] Yes. I think we should remove local variables which is not 
used, such as below.
https://github.com/apache/spark/blob/branch-1.4/R/pkg/R/deserialize.R#L104

 local variable assigned but may not be used
 ---

 Key: SPARK-9249
 URL: https://issues.apache.org/jira/browse/SPARK-9249
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Yu Ishikawa
Priority: Minor

 local variable assigned but may not be used
 For example:
 {noformat}
 R/deserialize.R:105:3: warning: local variable ‘data’ assigned but may not be 
 used
   data - readBin(con, raw(), as.integer(dataLen), endian = big)
   ^~~~
 R/deserialize.R:109:3: warning: local variable ‘data’ assigned but may not be 
 used
   data - readBin(con, raw(), as.integer(dataLen), endian = big)
   ^~~~
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-9249) local variable assigned but may not be used

2015-07-23 Thread Yu Ishikawa (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639955#comment-14639955
 ] 

Yu Ishikawa commented on SPARK-9249:


I'm working this issue.

 local variable assigned but may not be used
 ---

 Key: SPARK-9249
 URL: https://issues.apache.org/jira/browse/SPARK-9249
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Yu Ishikawa
Priority: Minor

 local variable assigned but may not be used
 For example:
 {noformat}
 R/deserialize.R:105:3: warning: local variable ‘data’ assigned but may not be 
 used
   data - readBin(con, raw(), as.integer(dataLen), endian = big)
   ^~~~
 R/deserialize.R:109:3: warning: local variable ‘data’ assigned but may not be 
 used
   data - readBin(con, raw(), as.integer(dataLen), endian = big)
   ^~~~
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-9248) Closing curly-braces should always be on their own line

2015-07-22 Thread Yu Ishikawa (JIRA)
Yu Ishikawa created SPARK-9248:
--

 Summary: Closing curly-braces should always be on their own line
 Key: SPARK-9248
 URL: https://issues.apache.org/jira/browse/SPARK-9248
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Yu Ishikawa
Priority: Minor


Closing curly-braces should always be on their own line



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-9249) local variable assigned but may not be used

2015-07-22 Thread Yu Ishikawa (JIRA)
Yu Ishikawa created SPARK-9249:
--

 Summary: local variable assigned but may not be used
 Key: SPARK-9249
 URL: https://issues.apache.org/jira/browse/SPARK-9249
 Project: Spark
  Issue Type: Sub-task
  Components: SparkR
Reporter: Yu Ishikawa
Priority: Minor


local variable assigned but may not be used



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   3   >