[jira] [Commented] (SPARK-1359) SGD implementation is not efficient
[ https://issues.apache.org/jira/browse/SPARK-1359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15219190#comment-15219190 ] Yu Ishikawa commented on SPARK-1359: [~mbaddar] Since the current ann in mllib depends on `GradientDescent`, we should modify the efficienty. How do we evaluate new implementation against the current implementation? And What are better tasks to evaluate it? - Metrics 1. Convergence Effieiency 2. Compute Cost 3. Compute Time 4. Other - Task 1. Logistic Regression and Linear Regression with random generated data 2. Logistic Regression and Linear Regression with any Kaggle data 3. Other I make an implementation of Parallelized Stochastic Gradient Descent. https://github.com/yu-iskw/spark-parallelized-sgd > SGD implementation is not efficient > --- > > Key: SPARK-1359 > URL: https://issues.apache.org/jira/browse/SPARK-1359 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 0.9.0, 1.0.0 >Reporter: Xiangrui Meng > > The SGD implementation samples a mini-batch to compute the stochastic > gradient. This is not efficient because examples are provided via an iterator > interface. We have to scan all of them to obtain a sample. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13265) Refactoring of basic ML import/export for other file system besides HDFS
Yu Ishikawa created SPARK-13265: --- Summary: Refactoring of basic ML import/export for other file system besides HDFS Key: SPARK-13265 URL: https://issues.apache.org/jira/browse/SPARK-13265 Project: Spark Issue Type: Bug Components: ML Reporter: Yu Ishikawa We can't save a model into other file system besides HDFS, for example Amazon S3. Because the file system is fixed at Spark 1.6. https://github.com/apache/spark/blob/v1.6.0/mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala#L78 {noformat} scala> val kmeans = new KMeans().setK(2) scala> model.write.overwrite().save("s3n://test-bucket/tmp/test-kmeans/") java.lang.IllegalArgumentException: Wrong FS: s3n://test-bucket/tmp/test-kmeans, expected: hdfs://ec2-54-248-42-97.ap-northeast-1.compute.amazonaws.c om:9000 at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:590) at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:170) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:803) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1332) at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:80) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:36) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:41) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:43) at $iwC$$iwC$$iwC$$iwC$$iwC.(:45) at $iwC$$iwC$$iwC$$iwC.(:47) at $iwC$$iwC$$iwC.(:49) at $iwC$$iwC.(:51) at $iwC.(:53) at (:55) at .(:59) at .() at .(:7) at .() at $print() at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-13265) Refactoring of basic ML import/export for other file system besides HDFS
[ https://issues.apache.org/jira/browse/SPARK-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Ishikawa updated SPARK-13265: Description: We can't save a model into other file system besides HDFS, for example Amazon S3. Because the file system is fixed at Spark 1.6. https://github.com/apache/spark/blob/v1.6.0/mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala#L78 When I tried to export a KMeans model in to Amazon S3 {noformat} scala> val kmeans = new KMeans().setK(2) scala> model.write.overwrite().save("s3n://test-bucket/tmp/test-kmeans/") java.lang.IllegalArgumentException: Wrong FS: s3n://test-bucket/tmp/test-kmeans, expected: hdfs://ec2-54-248-42-97.ap-northeast-1.compute.amazonaws.c om:9000 at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:590) at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:170) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:803) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1332) at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:80) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:36) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:41) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:43) at $iwC$$iwC$$iwC$$iwC$$iwC.(:45) at $iwC$$iwC$$iwC$$iwC.(:47) at $iwC$$iwC$$iwC.(:49) at $iwC$$iwC.(:51) at $iwC.(:53) at (:55) at .(:59) at .() at .(:7) at .() at $print() at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) {noformat} was: We can't save a model into other file system besides HDFS, for example Amazon S3. Because the file system is fixed at Spark 1.6. https://github.com/apache/spark/blob/v1.6.0/mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala#L78 {noformat} scala> val kmeans = new KMeans().setK(2) scala> model.write.overwrite().save("s3n://test-bucket/tmp/test-kmeans/") java.lang.IllegalArgumentException: Wrong FS: s3n://test-bucket/tmp/test-kmeans, expected: hdfs://ec2-54-248-42-97.ap-northeast-1.compute.amazonaws.c om:9000 at
[jira] [Updated] (SPARK-13265) Refactoring of basic ML import/export for other file system besides HDFS
[ https://issues.apache.org/jira/browse/SPARK-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Ishikawa updated SPARK-13265: Description: We can't save a model into other file system besides HDFS, for example Amazon S3. Because the file system is fixed at Spark 1.6. https://github.com/apache/spark/blob/v1.6.0/mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala#L78 When I tried to export a KMeans model in to Amazon S3 {noformat} scala> val kmeans = new KMeans().setK(2) scala> kmeans.fit(train) scala> model.write.overwrite().save("s3n://test-bucket/tmp/test-kmeans/") java.lang.IllegalArgumentException: Wrong FS: s3n://test-bucket/tmp/test-kmeans, expected: hdfs://ec2-54-248-42-97.ap-northeast-1.compute.amazonaws.c om:9000 at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:590) at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:170) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:803) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1332) at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:80) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:36) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:41) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:43) at $iwC$$iwC$$iwC$$iwC$$iwC.(:45) at $iwC$$iwC$$iwC$$iwC.(:47) at $iwC$$iwC$$iwC.(:49) at $iwC$$iwC.(:51) at $iwC.(:53) at (:55) at .(:59) at .() at .(:7) at .() at $print() at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) {noformat} was: We can't save a model into other file system besides HDFS, for example Amazon S3. Because the file system is fixed at Spark 1.6. https://github.com/apache/spark/blob/v1.6.0/mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala#L78 When I tried to export a KMeans model in to Amazon S3 {noformat} scala> val kmeans = new KMeans().setK(2) scala> model.write.overwrite().save("s3n://test-bucket/tmp/test-kmeans/") java.lang.IllegalArgumentException: Wrong FS: s3n://test-bucket/tmp/test-kmeans, expected:
[jira] [Updated] (SPARK-13265) Refactoring of basic ML import/export for other file system besides HDFS
[ https://issues.apache.org/jira/browse/SPARK-13265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Ishikawa updated SPARK-13265: Description: We can't save a model into other file system besides HDFS, for example Amazon S3. Because the file system is fixed at Spark 1.6. https://github.com/apache/spark/blob/v1.6.0/mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala#L78 When I tried to export a KMeans model into Amazon S3, I got the error. {noformat} scala> val kmeans = new KMeans().setK(2) scala> val model = kmeans.fit(train) scala> model.write.overwrite().save("s3n://test-bucket/tmp/test-kmeans/") java.lang.IllegalArgumentException: Wrong FS: s3n://test-bucket/tmp/test-kmeans, expected: hdfs://ec2-54-248-42-97.ap-northeast-1.compute.amazonaws.c om:9000 at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:590) at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:170) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:803) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1332) at org.apache.spark.ml.util.MLWriter.save(ReadWrite.scala:80) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:36) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:41) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:43) at $iwC$$iwC$$iwC$$iwC$$iwC.(:45) at $iwC$$iwC$$iwC$$iwC.(:47) at $iwC$$iwC$$iwC.(:49) at $iwC$$iwC.(:51) at $iwC.(:53) at (:55) at .(:59) at .() at .(:7) at .() at $print() at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) {noformat} was: We can't save a model into other file system besides HDFS, for example Amazon S3. Because the file system is fixed at Spark 1.6. https://github.com/apache/spark/blob/v1.6.0/mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala#L78 When I tried to export a KMeans model in to Amazon S3 {noformat} scala> val kmeans = new KMeans().setK(2) scala> kmeans.fit(train) scala> model.write.overwrite().save("s3n://test-bucket/tmp/test-kmeans/") java.lang.IllegalArgumentException: Wrong FS:
[jira] [Commented] (SPARK-11618) Refactoring of basic ML import/export
[ https://issues.apache.org/jira/browse/SPARK-11618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15140103#comment-15140103 ] Yu Ishikawa commented on SPARK-11618: - Hi [~josephkb], I have a question about ML import/export. It seems that the current saving method supports only saving a model on HDFS. Do we have any plan to save one on other file system, such as Amazon S3? https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala#L77 > Refactoring of basic ML import/export > - > > Key: SPARK-11618 > URL: https://issues.apache.org/jira/browse/SPARK-11618 > Project: Spark > Issue Type: Sub-task > Components: ML >Reporter: Joseph K. Bradley >Assignee: Joseph K. Bradley > Fix For: 1.6.0 > > > This is for a few updates to the original PR for basic ML import/export in > [SPARK-11217]. > * The original PR diverges from the design doc in that it does not include > the Spark version or a model format version. We should include the Spark > version in the metadata. If we do that, then we don't really need a model > format version. > * Proposal: DefaultParamsWriter includes two separable pieces of logic in > save(): (a) handling overwriting and (b) saving Params. I want to separate > these by putting (a) in a save() method in Writer which calls an abstract > saveImpl, and (b) in the saveImpl implementation in DefaultParamsWriter. > This is described below: > {code} > abstract class Writer { > def save(path: String) = { > // handle overwrite > saveImpl(path) > } > def saveImpl(path: String) // abstract > } > class DefaultParamsWriter extends Writer { > def saveImpl(path: String) = { > // save Params > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-13239) Click-Through Rate Prediction
Yu Ishikawa created SPARK-13239: --- Summary: Click-Through Rate Prediction Key: SPARK-13239 URL: https://issues.apache.org/jira/browse/SPARK-13239 Project: Spark Issue Type: Sub-task Components: ML Reporter: Yu Ishikawa Priority: Minor Apply ML Pipeline API to Click-Through Rate Prediction https://www.kaggle.com/c/avazu-ctr-prediction -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-13239) Click-Through Rate Prediction
[ https://issues.apache.org/jira/browse/SPARK-13239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15138490#comment-15138490 ] Yu Ishikawa commented on SPARK-13239: - I'm working on this issue. > Click-Through Rate Prediction > - > > Key: SPARK-13239 > URL: https://issues.apache.org/jira/browse/SPARK-13239 > Project: Spark > Issue Type: Sub-task > Components: ML >Reporter: Yu Ishikawa >Priority: Minor > > Apply ML Pipeline API to Click-Through Rate Prediction > https://www.kaggle.com/c/avazu-ctr-prediction -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10870) Criteo Display Advertising Challenge
[ https://issues.apache.org/jira/browse/SPARK-10870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15110060#comment-15110060 ] Yu Ishikawa commented on SPARK-10870: - [~prudenko] Should we test the Kaggle data with the winning solution used GBDT encoder? > Criteo Display Advertising Challenge > > > Key: SPARK-10870 > URL: https://issues.apache.org/jira/browse/SPARK-10870 > Project: Spark > Issue Type: Sub-task > Components: ML >Reporter: Peter Rudenko > > Very useful dataset to test pipeline because of: > # "Big data" dataset - original Kaggle competition dataset is 12 gb, but > there's [1tb|http://labs.criteo.com/downloads/download-terabyte-click-logs/] > dataset of the same schema as well. > # Sparse models - categorical features has high cardinality > # Reproducible results - because it's public and many other distributed > machine learning libraries (e.g. > [wormwhole|https://github.com/dmlc/wormhole/blob/master/doc/tutorial/criteo_kaggle.rst], > [parameter > server|https://github.com/dmlc/parameter_server/blob/master/example/linear/criteo/README.md], > [azure > ml|https://azure.microsoft.com/en-us/documentation/articles/machine-learning-data-science-process-hive-criteo-walkthrough/#mltasks] > etc.) have made a base line benchmarks on which we could compare. > I have some base line results with custom models (GBDT encoders and tuned LR) > on spark-1.4. Will make pipelines using public spark model. [Winning > solution|http://www.csie.ntu.edu.tw/~r01922136/kaggle-2014-criteo.pdf] used > GBDT encoder (not available in spark, but not difficult to make one from GBT > from mllib) + hashing + factorization machine (planned for spark-1.6). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12215) User guide section for KMeans in spark.ml
[ https://issues.apache.org/jira/browse/SPARK-12215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049975#comment-15049975 ] Yu Ishikawa commented on SPARK-12215: - I'll work on this issue. > User guide section for KMeans in spark.ml > - > > Key: SPARK-12215 > URL: https://issues.apache.org/jira/browse/SPARK-12215 > Project: Spark > Issue Type: Documentation > Components: Documentation, ML >Reporter: Joseph K. Bradley >Assignee: Yu Ishikawa > > [~yuu.ishik...@gmail.com] Will you have time to add a user guide section for > this? Thanks in advance! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12215) User guide section for KMeans in spark.ml
[ https://issues.apache.org/jira/browse/SPARK-12215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15050078#comment-15050078 ] Yu Ishikawa commented on SPARK-12215: - I have sent a pull request about this issue. https://github.com/apache/spark/pull/10244 > User guide section for KMeans in spark.ml > - > > Key: SPARK-12215 > URL: https://issues.apache.org/jira/browse/SPARK-12215 > Project: Spark > Issue Type: Documentation > Components: Documentation, ML >Reporter: Joseph K. Bradley >Assignee: Yu Ishikawa > > [~yuu.ishik...@gmail.com] Will you have time to add a user guide section for > this? Thanks in advance! -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10285) Add @since annotation to pyspark.ml.util
[ https://issues.apache.org/jira/browse/SPARK-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15050085#comment-15050085 ] Yu Ishikawa commented on SPARK-10285: - [~mengxr] can we close this issue? Since [~davies] told me that there are no public methods under pyspark.ml.util. https://github.com/apache/spark/pull/8695#issuecomment-139377373 > Add @since annotation to pyspark.ml.util > > > Key: SPARK-10285 > URL: https://issues.apache.org/jira/browse/SPARK-10285 > Project: Spark > Issue Type: Sub-task > Components: Documentation, ML, PySpark >Reporter: Xiangrui Meng >Assignee: Yu Ishikawa >Priority: Minor > Labels: starter > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6518) Add example code and user guide for bisecting k-means
[ https://issues.apache.org/jira/browse/SPARK-6518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15025717#comment-15025717 ] Yu Ishikawa commented on SPARK-6518: All right. I'll send a PR soon. Thanks! > Add example code and user guide for bisecting k-means > - > > Key: SPARK-6518 > URL: https://issues.apache.org/jira/browse/SPARK-6518 > Project: Spark > Issue Type: Documentation > Components: MLlib >Reporter: Yu Ishikawa >Assignee: Yu Ishikawa > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6518) Add example code and user guide for bisecting k-means
[ https://issues.apache.org/jira/browse/SPARK-6518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15025995#comment-15025995 ] Yu Ishikawa commented on SPARK-6518: Can I split this issue to docs and an example? > Add example code and user guide for bisecting k-means > - > > Key: SPARK-6518 > URL: https://issues.apache.org/jira/browse/SPARK-6518 > Project: Spark > Issue Type: Documentation > Components: MLlib >Reporter: Yu Ishikawa >Assignee: Yu Ishikawa > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8459) Add import/export to spark.mllib bisecting k-means
[ https://issues.apache.org/jira/browse/SPARK-8459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003015#comment-15003015 ] Yu Ishikawa commented on SPARK-8459: I'm working on this issue. > Add import/export to spark.mllib bisecting k-means > -- > > Key: SPARK-8459 > URL: https://issues.apache.org/jira/browse/SPARK-8459 > Project: Spark > Issue Type: New Feature > Components: MLlib >Reporter: Yu Ishikawa > Labels: 1.7.0 > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11664) Add methods to get bisecting k-means cluster structure
[ https://issues.apache.org/jira/browse/SPARK-11664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003085#comment-15003085 ] Yu Ishikawa commented on SPARK-11664: - [~srowen] thank you for letting me know. I intended to set it as "Labels". > Add methods to get bisecting k-means cluster structure > -- > > Key: SPARK-11664 > URL: https://issues.apache.org/jira/browse/SPARK-11664 > Project: Spark > Issue Type: Sub-task > Components: MLlib >Reporter: Yu Ishikawa >Priority: Minor > > I think users want to visualize the result of bisecting k-means clustering as > a dendrogram in order to confirm it. So it would be great to support method > to get the cluster tree structure as an adjacency list, linkage matrix and so > on. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11666) Find the best `k` by cutting bisecting k-means cluster tree without recomputation
Yu Ishikawa created SPARK-11666: --- Summary: Find the best `k` by cutting bisecting k-means cluster tree without recomputation Key: SPARK-11666 URL: https://issues.apache.org/jira/browse/SPARK-11666 Project: Spark Issue Type: Sub-task Components: MLlib Reporter: Yu Ishikawa Priority: Minor For example, scikit-learn's hierarchical clustering support a feature to extract partial tree from the result. We should support a feature like that in order to reduce compute cost. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11664) Add methods to get bisecting k-means cluster structure
Yu Ishikawa created SPARK-11664: --- Summary: Add methods to get bisecting k-means cluster structure Key: SPARK-11664 URL: https://issues.apache.org/jira/browse/SPARK-11664 Project: Spark Issue Type: Sub-task Components: MLlib Reporter: Yu Ishikawa Priority: Minor I think users want to visualize the result of bisecting k-means clustering as a dendrogram in order to confirm it. So it would be great to support method to get the cluster tree structure as an adjacency list, linkage matrix and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11665) Support other distance metrics for bisecting k-means
Yu Ishikawa created SPARK-11665: --- Summary: Support other distance metrics for bisecting k-means Key: SPARK-11665 URL: https://issues.apache.org/jira/browse/SPARK-11665 Project: Spark Issue Type: Sub-task Components: MLlib Reporter: Yu Ishikawa Priority: Minor Some guys reqested me to support other distance metrics, such as cosine distance, tanimoto distance, in bisecting k-means. We should - desing the interfaces for distance metrics - support the distances -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-8459) Add import/export to spark.mllib bisecting k-means
[ https://issues.apache.org/jira/browse/SPARK-8459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Ishikawa updated SPARK-8459: --- Labels: 1.7.0 (was: ) > Add import/export to spark.mllib bisecting k-means > -- > > Key: SPARK-8459 > URL: https://issues.apache.org/jira/browse/SPARK-8459 > Project: Spark > Issue Type: New Feature > Components: MLlib >Reporter: Yu Ishikawa > Labels: 1.7.0 > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11611) Python API for bisecting k-means
[ https://issues.apache.org/jira/browse/SPARK-11611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997875#comment-14997875 ] Yu Ishikawa commented on SPARK-11611: - [~mengxr] can we change the target version from 1.7.0 to 1.6.0? > Python API for bisecting k-means > > > Key: SPARK-11611 > URL: https://issues.apache.org/jira/browse/SPARK-11611 > Project: Spark > Issue Type: New Feature > Components: MLlib, PySpark >Reporter: Xiangrui Meng > > Implement Python API for bisecting k-means. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11610) Make the docs of LDAModel.describeTopics in Python more specific
[ https://issues.apache.org/jira/browse/SPARK-11610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Ishikawa updated SPARK-11610: Target Version/s: (was: 1.6.0) > Make the docs of LDAModel.describeTopics in Python more specific > > > Key: SPARK-11610 > URL: https://issues.apache.org/jira/browse/SPARK-11610 > Project: Spark > Issue Type: Sub-task > Components: MLlib, PySpark >Reporter: Yu Ishikawa >Assignee: Yu Ishikawa >Priority: Trivial > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11610) Make the docs of LDAModel.describeTopics in Python more specific
[ https://issues.apache.org/jira/browse/SPARK-11610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997716#comment-14997716 ] Yu Ishikawa commented on SPARK-11610: - [~josephkb] sorry for that. Thanks for fixing it. > Make the docs of LDAModel.describeTopics in Python more specific > > > Key: SPARK-11610 > URL: https://issues.apache.org/jira/browse/SPARK-11610 > Project: Spark > Issue Type: Sub-task > Components: MLlib, PySpark >Reporter: Yu Ishikawa >Assignee: Yu Ishikawa >Priority: Trivial > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11611) Python API for bisecting k-means
[ https://issues.apache.org/jira/browse/SPARK-11611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997725#comment-14997725 ] Yu Ishikawa commented on SPARK-11611: - I'm working on this issue. > Python API for bisecting k-means > > > Key: SPARK-11611 > URL: https://issues.apache.org/jira/browse/SPARK-11611 > Project: Spark > Issue Type: New Feature > Components: MLlib, PySpark >Reporter: Xiangrui Meng > > Implement Python API for bisecting k-means. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-6517) Bisecting k-means clustering
[ https://issues.apache.org/jira/browse/SPARK-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997724#comment-14997724 ] Yu Ishikawa edited comment on SPARK-6517 at 11/10/15 12:20 AM: --- [~jeffzhang] thank you for your cooperation. But, I'm ready to send a PR for that issue. was (Author: yuu.ishik...@gmail.com): [~jeffzhang] thank you for your cooperation. But, I'm readdy to send a PR for that issue. > Bisecting k-means clustering > > > Key: SPARK-6517 > URL: https://issues.apache.org/jira/browse/SPARK-6517 > Project: Spark > Issue Type: New Feature > Components: MLlib >Reporter: Yu Ishikawa >Assignee: Yu Ishikawa > Labels: clustering > Fix For: 1.6.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6517) Bisecting k-means clustering
[ https://issues.apache.org/jira/browse/SPARK-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14997724#comment-14997724 ] Yu Ishikawa commented on SPARK-6517: [~jeffzhang] thank you for your cooperation. But, I'm readdy to send a PR for that issue. > Bisecting k-means clustering > > > Key: SPARK-6517 > URL: https://issues.apache.org/jira/browse/SPARK-6517 > Project: Spark > Issue Type: New Feature > Components: MLlib >Reporter: Yu Ishikawa >Assignee: Yu Ishikawa > Labels: clustering > Fix For: 1.6.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11610) Make the docs of LDAModel.describeTopics in Python more specific
Yu Ishikawa created SPARK-11610: --- Summary: Make the docs of LDAModel.describeTopics in Python more specific Key: SPARK-11610 URL: https://issues.apache.org/jira/browse/SPARK-11610 Project: Spark Issue Type: Sub-task Components: MLlib, PySpark Reporter: Yu Ishikawa Priority: Trivial -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11566) Refactoring GaussianMixtureModel.gaussians in Python
Yu Ishikawa created SPARK-11566: --- Summary: Refactoring GaussianMixtureModel.gaussians in Python Key: SPARK-11566 URL: https://issues.apache.org/jira/browse/SPARK-11566 Project: Spark Issue Type: Improvement Components: MLlib Affects Versions: 1.5.1 Reporter: Yu Ishikawa Priority: Trivial We could also implement {{GaussianMixtureModelWrapper.gaussians}} in Scala with {{SerDe.dumps}}, instead of returning Java {{Object}}. So, it would be a little simpler and more efficient. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-11566) Refactoring GaussianMixtureModel.gaussians in Python
[ https://issues.apache.org/jira/browse/SPARK-11566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Ishikawa updated SPARK-11566: Component/s: PySpark > Refactoring GaussianMixtureModel.gaussians in Python > > > Key: SPARK-11566 > URL: https://issues.apache.org/jira/browse/SPARK-11566 > Project: Spark > Issue Type: Improvement > Components: MLlib, PySpark >Affects Versions: 1.5.1 >Reporter: Yu Ishikawa >Priority: Trivial > > We could also implement {{GaussianMixtureModelWrapper.gaussians}} in Scala > with {{SerDe.dumps}}, instead of returning Java {{Object}}. So, it would be a > little simpler and more efficient. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11515) QuantileDiscretizer should take random seed
[ https://issues.apache.org/jira/browse/SPARK-11515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991294#comment-14991294 ] Yu Ishikawa commented on SPARK-11515: - I'll work on this issue. > QuantileDiscretizer should take random seed > --- > > Key: SPARK-11515 > URL: https://issues.apache.org/jira/browse/SPARK-11515 > Project: Spark > Issue Type: New Feature > Components: ML >Reporter: Joseph K. Bradley >Priority: Minor > > QuantileDiscretizer takes a random sample to select bins. It currently does > not specify a seed for the XORShiftRandom, but it should take a seed by > extending the HasSeed Param. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9722) Pass random seed to spark.ml RandomForest findSplitsBins
[ https://issues.apache.org/jira/browse/SPARK-9722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14990877#comment-14990877 ] Yu Ishikawa commented on SPARK-9722: [~josephkb] I'll add a seed Param to {{DecisionTreeClassifier}} and {{DecisionTreeRegressor}}. > Pass random seed to spark.ml RandomForest findSplitsBins > > > Key: SPARK-9722 > URL: https://issues.apache.org/jira/browse/SPARK-9722 > Project: Spark > Issue Type: Improvement > Components: ML >Reporter: Joseph K. Bradley >Assignee: Yu Ishikawa >Priority: Trivial > Fix For: 1.6.0 > > > Trees use XORShiftRandom when binning continuous features. Currently, they > use a fixed seed of 1. They should accept a random seed param and use that > instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10729) word2vec model save for python
[ https://issues.apache.org/jira/browse/SPARK-10729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14991206#comment-14991206 ] Yu Ishikawa commented on SPARK-10729: - Sorry, the cause isn't `@inherit_doc`. I misunderstood. Anyway, we should discuss the documentation. > word2vec model save for python > -- > > Key: SPARK-10729 > URL: https://issues.apache.org/jira/browse/SPARK-10729 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 1.4.1, 1.5.0 >Reporter: Joseph A Gartner III > Fix For: 1.5.0 > > > The ability to save a word2vec model has not been ported to python, and would > be extremely useful to have given the long training period. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-11492) Ignore commented code warnings in SparkR
Yu Ishikawa created SPARK-11492: --- Summary: Ignore commented code warnings in SparkR Key: SPARK-11492 URL: https://issues.apache.org/jira/browse/SPARK-11492 Project: Spark Issue Type: Sub-task Components: SparkR Reporter: Yu Ishikawa Priority: Minor cc [~shivaram] We have many commented code warnings under the lastest lintr. Should we turnn off {{commented_code_linter}} rule of lintr? {noformat} # rdd <- lapply(parallelize(sc, 1:10), function(x) list(a=x, b=as.character(x))) ^~ R/SQLContext.R:176:3: style: Commented code should be removed. # df <- toDF(rdd) ^~~ R/SQLContext.R:232:3: style: Commented code should be removed. # sc <- sparkR.init() ^~~ R/SQLContext.R:233:3: style: Commented code should be removed. # sqlContext <- sparkRSQL.init(sc) ^~~~ R/SQLContext.R:234:3: style: Commented code should be removed. # rdd <- texFile(sc, "path/to/json") ^~ R/SQLContext.R:235:3: style: Commented code should be removed. # df <- jsonRDD(sqlContext, rdd) ^~ {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-11492) Ignore commented code warnings in SparkR
[ https://issues.apache.org/jira/browse/SPARK-11492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Ishikawa closed SPARK-11492. --- Resolution: Duplicate > Ignore commented code warnings in SparkR > > > Key: SPARK-11492 > URL: https://issues.apache.org/jira/browse/SPARK-11492 > Project: Spark > Issue Type: Sub-task > Components: SparkR >Reporter: Yu Ishikawa >Priority: Minor > Fix For: 1.5.0 > > > cc [~shivaram] > We have many commented code warnings under the lastest lintr. Should we turnn > off {{commented_code_linter}} rule of lintr? > {noformat} > # rdd <- lapply(parallelize(sc, 1:10), function(x) list(a=x, > b=as.character(x))) > > ^~ > R/SQLContext.R:176:3: style: Commented code should be removed. > # df <- toDF(rdd) > ^~~ > R/SQLContext.R:232:3: style: Commented code should be removed. > # sc <- sparkR.init() > ^~~ > R/SQLContext.R:233:3: style: Commented code should be removed. > # sqlContext <- sparkRSQL.init(sc) > ^~~~ > R/SQLContext.R:234:3: style: Commented code should be removed. > # rdd <- texFile(sc, "path/to/json") > ^~ > R/SQLContext.R:235:3: style: Commented code should be removed. > # df <- jsonRDD(sqlContext, rdd) > ^~ > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11492) Ignore commented code warnings in SparkR
[ https://issues.apache.org/jira/browse/SPARK-11492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14988952#comment-14988952 ] Yu Ishikawa commented on SPARK-11492: - [~andreas.fe...@herold.at] sorry, thank you for letting me know. > Ignore commented code warnings in SparkR > > > Key: SPARK-11492 > URL: https://issues.apache.org/jira/browse/SPARK-11492 > Project: Spark > Issue Type: Sub-task > Components: SparkR >Reporter: Yu Ishikawa >Priority: Minor > Fix For: 1.5.0 > > > cc [~shivaram] > We have many commented code warnings under the lastest lintr. Should we turnn > off {{commented_code_linter}} rule of lintr? > {noformat} > # rdd <- lapply(parallelize(sc, 1:10), function(x) list(a=x, > b=as.character(x))) > > ^~ > R/SQLContext.R:176:3: style: Commented code should be removed. > # df <- toDF(rdd) > ^~~ > R/SQLContext.R:232:3: style: Commented code should be removed. > # sc <- sparkR.init() > ^~~ > R/SQLContext.R:233:3: style: Commented code should be removed. > # sqlContext <- sparkRSQL.init(sc) > ^~~~ > R/SQLContext.R:234:3: style: Commented code should be removed. > # rdd <- texFile(sc, "path/to/json") > ^~ > R/SQLContext.R:235:3: style: Commented code should be removed. > # df <- jsonRDD(sqlContext, rdd) > ^~ > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10729) word2vec model save for python
[ https://issues.apache.org/jira/browse/SPARK-10729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14988962#comment-14988962 ] Yu Ishikawa commented on SPARK-10729: - Got it. That's because Word2VecModel in Python isn't attached {{@inherit_doc}}. I think we should add the tag. Could you close this issue? It would be great to discuss improving the documentation on another issue. > word2vec model save for python > -- > > Key: SPARK-10729 > URL: https://issues.apache.org/jira/browse/SPARK-10729 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 1.4.1, 1.5.0 >Reporter: Joseph A Gartner III > > The ability to save a word2vec model has not been ported to python, and would > be extremely useful to have given the long training period. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10729) word2vec model save for python
[ https://issues.apache.org/jira/browse/SPARK-10729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14988926#comment-14988926 ] Yu Ishikawa commented on SPARK-10729: - [~jgartner] Is this issue same as https://issues.apache.org/jira/browse/SPARK-7104? > word2vec model save for python > -- > > Key: SPARK-10729 > URL: https://issues.apache.org/jira/browse/SPARK-10729 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 1.4.1, 1.5.0 >Reporter: Joseph A Gartner III > > The ability to save a word2vec model has not been ported to python, and would > be extremely useful to have given the long training period. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-11263) lintr Throws Warnings on Commented Code in Documentation
[ https://issues.apache.org/jira/browse/SPARK-11263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14988965#comment-14988965 ] Yu Ishikawa commented on SPARK-11263: - [~felixcheung] I think it would be great to ignore the commented code warnings with the settings of lintr. > lintr Throws Warnings on Commented Code in Documentation > > > Key: SPARK-11263 > URL: https://issues.apache.org/jira/browse/SPARK-11263 > Project: Spark > Issue Type: Task > Components: SparkR >Reporter: Sen Fang >Priority: Minor > > This comes from a discussion in https://github.com/apache/spark/pull/9205 > Currently lintr throws many warnings around "style: Commented code should be > removed." > For example > {code} > R/RDD.R:260:3: style: Commented code should be removed. > # unpersist(rdd) # rdd@@env$isCached == FALSE > ^~~ > R/RDD.R:283:3: style: Commented code should be removed. > # sc <- sparkR.init() > ^~~ > R/RDD.R:284:3: style: Commented code should be removed. > # setCheckpointDir(sc, "checkpoint") > ^~ > {code} > Some of them are legitimate warnings but most of them are simply code > examples of functions that are not part of public API. For example > {code} > # @examples > #\dontrun{ > # sc <- sparkR.init() > # rdd <- parallelize(sc, 1:10, 2L) > # cache(rdd) > #} > {code} > One workaround is to convert them back to Roxygen doc but assign {{#' @rdname > .ignore}} and Roxygen will skip these functions with message {{Skipping > invalid path: .ignore.Rd}} > That being said, I feel people usually praise/criticize R package > documentation is "expert friendly". The convention seems to be providing as > much documentation as possible but don't export functions that is unstable or > developer only. If users choose to use them, they acknowledge the risk by > using {{:::}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6001) K-Means clusterer should return the assignments of input points to clusters
[ https://issues.apache.org/jira/browse/SPARK-6001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14984574#comment-14984574 ] Yu Ishikawa commented on SPARK-6001: [~josephkb] can we close this issue? > K-Means clusterer should return the assignments of input points to clusters > --- > > Key: SPARK-6001 > URL: https://issues.apache.org/jira/browse/SPARK-6001 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 1.2.1 >Reporter: Derrick Burns >Priority: Minor > > The K-Means clusterer returns a KMeansModel that contains the cluster > centers. However, when available, I suggest that the K-Means clusterer also > return an RDD of the assignments of the input data to the clusters. While the > assignments can be computed given the KMeansModel, why not return assignments > if they are available to save re-computation costs. > The K-means implementation at > https://github.com/derrickburns/generalized-kmeans-clustering returns the > assignments when available. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10266) Add @Since annotation to ml.tuning
[ https://issues.apache.org/jira/browse/SPARK-10266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14979152#comment-14979152 ] Yu Ishikawa commented on SPARK-10266: - Instead of [~Ehsan Mohyedin Kermani], I'll work on this issue. > Add @Since annotation to ml.tuning > -- > > Key: SPARK-10266 > URL: https://issues.apache.org/jira/browse/SPARK-10266 > Project: Spark > Issue Type: Sub-task > Components: Documentation, ML >Reporter: Xiangrui Meng >Assignee: Ehsan Mohyedin Kermani >Priority: Minor > Labels: starter > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10285) Add @since annotation to pyspark.ml.util
[ https://issues.apache.org/jira/browse/SPARK-10285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791815#comment-14791815 ] Yu Ishikawa commented on SPARK-10285: - Close this PR because those are non-public API. > Add @since annotation to pyspark.ml.util > > > Key: SPARK-10285 > URL: https://issues.apache.org/jira/browse/SPARK-10285 > Project: Spark > Issue Type: Sub-task > Components: Documentation, ML, PySpark >Reporter: Xiangrui Meng >Assignee: Yu Ishikawa >Priority: Minor > Labels: starter > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10512) Fix @since when a function doesn't have doc
[ https://issues.apache.org/jira/browse/SPARK-10512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Ishikawa updated SPARK-10512: Description: When I tried to add @since to a function which doesn't have doc, @since didn't go well. It seems that {{___doc___}} is {{None}} under {{since}} decorator. {noformat} Traceback (most recent call last): File "/Users/01004981/.pyenv/versions/2.6.8/lib/python2.6/runpy.py", line 122, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/Users/01004981/.pyenv/versions/2.6.8/lib/python2.6/runpy.py", line 34, in _run_code exec code in run_globals File "/Users/01004981/local/src/spark/myspark3/python/pyspark/mllib/recommendation.py", line 46, in class MatrixFactorizationModel(JavaModelWrapper, JavaSaveable, JavaLoader): File "/Users/01004981/local/src/spark/myspark3/python/pyspark/mllib/recommendation.py", line 166, in MatrixFactorizationModel @since("1.3.1") File "/Users/01004981/local/src/spark/myspark3/python/pyspark/__init__.py", line 63, in deco indents = indent_p.findall(f.__doc__) TypeError: expected string or buffer {noformat} was: When I tried to add @since to a function which doesn't have doc, @since didn't go well. It seems that {{___doc___}} is {{None]} under {{since}} decorator. {noformat} Traceback (most recent call last): File "/Users/01004981/.pyenv/versions/2.6.8/lib/python2.6/runpy.py", line 122, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/Users/01004981/.pyenv/versions/2.6.8/lib/python2.6/runpy.py", line 34, in _run_code exec code in run_globals File "/Users/01004981/local/src/spark/myspark3/python/pyspark/mllib/recommendation.py", line 46, in class MatrixFactorizationModel(JavaModelWrapper, JavaSaveable, JavaLoader): File "/Users/01004981/local/src/spark/myspark3/python/pyspark/mllib/recommendation.py", line 166, in MatrixFactorizationModel @since("1.3.1") File "/Users/01004981/local/src/spark/myspark3/python/pyspark/__init__.py", line 63, in deco indents = indent_p.findall(f.__doc__) TypeError: expected string or buffer {noformat} > Fix @since when a function doesn't have doc > --- > > Key: SPARK-10512 > URL: https://issues.apache.org/jira/browse/SPARK-10512 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 1.6.0 >Reporter: Yu Ishikawa > > When I tried to add @since to a function which doesn't have doc, @since > didn't go well. It seems that {{___doc___}} is {{None}} under {{since}} > decorator. > {noformat} > Traceback (most recent call last): > File "/Users/01004981/.pyenv/versions/2.6.8/lib/python2.6/runpy.py", line > 122, in _run_module_as_main > "__main__", fname, loader, pkg_name) > File "/Users/01004981/.pyenv/versions/2.6.8/lib/python2.6/runpy.py", line > 34, in _run_code > exec code in run_globals > File > "/Users/01004981/local/src/spark/myspark3/python/pyspark/mllib/recommendation.py", > line 46, in > class MatrixFactorizationModel(JavaModelWrapper, JavaSaveable, > JavaLoader): > File > "/Users/01004981/local/src/spark/myspark3/python/pyspark/mllib/recommendation.py", > line 166, in MatrixFactorizationModel > @since("1.3.1") > File "/Users/01004981/local/src/spark/myspark3/python/pyspark/__init__.py", > line 63, in deco > indents = indent_p.findall(f.__doc__) > TypeError: expected string or buffer > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10512) Fix @since when a function doesn't have doc
[ https://issues.apache.org/jira/browse/SPARK-10512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Ishikawa updated SPARK-10512: Description: When I tried to add @since to a function which doesn't have doc, @since didn't go well. It seems that {{___doc___}} is {{None]} under {{since}} decorator. {noformat} Traceback (most recent call last): File "/Users/01004981/.pyenv/versions/2.6.8/lib/python2.6/runpy.py", line 122, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/Users/01004981/.pyenv/versions/2.6.8/lib/python2.6/runpy.py", line 34, in _run_code exec code in run_globals File "/Users/01004981/local/src/spark/myspark3/python/pyspark/mllib/recommendation.py", line 46, in class MatrixFactorizationModel(JavaModelWrapper, JavaSaveable, JavaLoader): File "/Users/01004981/local/src/spark/myspark3/python/pyspark/mllib/recommendation.py", line 166, in MatrixFactorizationModel @since("1.3.1") File "/Users/01004981/local/src/spark/myspark3/python/pyspark/__init__.py", line 63, in deco indents = indent_p.findall(f.__doc__) TypeError: expected string or buffer {noformat} was: When I tried to add @since to a function which doesn't have doc, @since didn't go well. It seems that {{___doc___}} is {{None]} under {{since}} decorator. ``` Traceback (most recent call last): File "/Users/01004981/.pyenv/versions/2.6.8/lib/python2.6/runpy.py", line 122, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/Users/01004981/.pyenv/versions/2.6.8/lib/python2.6/runpy.py", line 34, in _run_code exec code in run_globals File "/Users/01004981/local/src/spark/myspark3/python/pyspark/mllib/recommendation.py", line 46, in class MatrixFactorizationModel(JavaModelWrapper, JavaSaveable, JavaLoader): File "/Users/01004981/local/src/spark/myspark3/python/pyspark/mllib/recommendation.py", line 166, in MatrixFactorizationModel @since("1.3.1") File "/Users/01004981/local/src/spark/myspark3/python/pyspark/__init__.py", line 63, in deco indents = indent_p.findall(f.__doc__) TypeError: expected string or buffer ``` > Fix @since when a function doesn't have doc > --- > > Key: SPARK-10512 > URL: https://issues.apache.org/jira/browse/SPARK-10512 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 1.6.0 >Reporter: Yu Ishikawa > > When I tried to add @since to a function which doesn't have doc, @since > didn't go well. It seems that {{___doc___}} is {{None]} under {{since}} > decorator. > {noformat} > Traceback (most recent call last): > File "/Users/01004981/.pyenv/versions/2.6.8/lib/python2.6/runpy.py", line > 122, in _run_module_as_main > "__main__", fname, loader, pkg_name) > File "/Users/01004981/.pyenv/versions/2.6.8/lib/python2.6/runpy.py", line > 34, in _run_code > exec code in run_globals > File > "/Users/01004981/local/src/spark/myspark3/python/pyspark/mllib/recommendation.py", > line 46, in > class MatrixFactorizationModel(JavaModelWrapper, JavaSaveable, > JavaLoader): > File > "/Users/01004981/local/src/spark/myspark3/python/pyspark/mllib/recommendation.py", > line 166, in MatrixFactorizationModel > @since("1.3.1") > File "/Users/01004981/local/src/spark/myspark3/python/pyspark/__init__.py", > line 63, in deco > indents = indent_p.findall(f.__doc__) > TypeError: expected string or buffer > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10512) Fix @since when a function doesn't have doc
Yu Ishikawa created SPARK-10512: --- Summary: Fix @since when a function doesn't have doc Key: SPARK-10512 URL: https://issues.apache.org/jira/browse/SPARK-10512 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 1.6.0 Reporter: Yu Ishikawa When I tried to add @since to a function which doesn't have doc, @since didn't go well. It seems that {{___doc___}} is {{None]} under {{since}} decorator. ``` Traceback (most recent call last): File "/Users/01004981/.pyenv/versions/2.6.8/lib/python2.6/runpy.py", line 122, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/Users/01004981/.pyenv/versions/2.6.8/lib/python2.6/runpy.py", line 34, in _run_code exec code in run_globals File "/Users/01004981/local/src/spark/myspark3/python/pyspark/mllib/recommendation.py", line 46, in class MatrixFactorizationModel(JavaModelWrapper, JavaSaveable, JavaLoader): File "/Users/01004981/local/src/spark/myspark3/python/pyspark/mllib/recommendation.py", line 166, in MatrixFactorizationModel @since("1.3.1") File "/Users/01004981/local/src/spark/myspark3/python/pyspark/__init__.py", line 63, in deco indents = indent_p.findall(f.__doc__) TypeError: expected string or buffer ``` -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10276) Add @since annotation to pyspark.mllib.recommendation
[ https://issues.apache.org/jira/browse/SPARK-10276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14736346#comment-14736346 ] Yu Ishikawa commented on SPARK-10276: - [~mengxr] should we add `@since` = to the class methods with `@classmethod` in PySpark? When I tried to do that, I got an error as follows. It seems that we can't rewrite {{___doc___}} of a `classmethod`. {noformat} Traceback (most recent call last): File "/Users/01004981/.pyenv/versions/2.6.8/lib/python2.6/runpy.py", line 122, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/Users/01004981/.pyenv/versions/2.6.8/lib/python2.6/runpy.py", line 34, in _run_code exec code in run_globals File "/Users/01004981/local/src/spark/myspark3/python/pyspark/mllib/recommendation.py", line 46, in class MatrixFactorizationModel(JavaModelWrapper, JavaSaveable, JavaLoader): File "/Users/01004981/local/src/spark/myspark3/python/pyspark/mllib/recommendation.py", line 175, in MatrixFactorizationModel @classmethod File "/Users/01004981/local/src/spark/myspark3/python/pyspark/__init__.py", line 62, in deco f.__doc__ = f.__doc__.rstrip() + "\n\n%s.. versionadded:: %s" % (indent, version) AttributeError: 'classmethod' object attribute '__doc__' is read-only {noformat} > Add @since annotation to pyspark.mllib.recommendation > - > > Key: SPARK-10276 > URL: https://issues.apache.org/jira/browse/SPARK-10276 > Project: Spark > Issue Type: Sub-task > Components: Documentation, MLlib, PySpark >Reporter: Xiangrui Meng >Priority: Minor > Labels: starter > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10512) Fix @since when a function doesn't have doc
[ https://issues.apache.org/jira/browse/SPARK-10512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14736973#comment-14736973 ] Yu Ishikawa commented on SPARK-10512: - [~davies] oh, I see. Thank you for letting me know. > Fix @since when a function doesn't have doc > --- > > Key: SPARK-10512 > URL: https://issues.apache.org/jira/browse/SPARK-10512 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 1.6.0 >Reporter: Yu Ishikawa > > When I tried to add @since to a function which doesn't have doc, @since > didn't go well. It seems that {{___doc___}} is {{None}} under {{since}} > decorator. > {noformat} > Traceback (most recent call last): > File "/Users/01004981/.pyenv/versions/2.6.8/lib/python2.6/runpy.py", line > 122, in _run_module_as_main > "__main__", fname, loader, pkg_name) > File "/Users/01004981/.pyenv/versions/2.6.8/lib/python2.6/runpy.py", line > 34, in _run_code > exec code in run_globals > File > "/Users/01004981/local/src/spark/myspark3/python/pyspark/mllib/recommendation.py", > line 46, in > class MatrixFactorizationModel(JavaModelWrapper, JavaSaveable, > JavaLoader): > File > "/Users/01004981/local/src/spark/myspark3/python/pyspark/mllib/recommendation.py", > line 166, in MatrixFactorizationModel > @since("1.3.1") > File "/Users/01004981/local/src/spark/myspark3/python/pyspark/__init__.py", > line 63, in deco > indents = indent_p.findall(f.__doc__) > TypeError: expected string or buffer > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10276) Add @since annotation to pyspark.mllib.recommendation
[ https://issues.apache.org/jira/browse/SPARK-10276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14738002#comment-14738002 ] Yu Ishikawa commented on SPARK-10276: - It seems that `@since` depends on an order of decorators. {noformat} # Work @classmethod @since("1.4.0") def foo(cls): # Not Work @since("1.4.0") @classmethod def bar(cls): {noformat} > Add @since annotation to pyspark.mllib.recommendation > - > > Key: SPARK-10276 > URL: https://issues.apache.org/jira/browse/SPARK-10276 > Project: Spark > Issue Type: Sub-task > Components: Documentation, MLlib, PySpark >Reporter: Xiangrui Meng >Priority: Minor > Labels: starter > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8467) Add LDAModel.describeTopics() in Python
[ https://issues.apache.org/jira/browse/SPARK-8467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14733729#comment-14733729 ] Yu Ishikawa commented on SPARK-8467: Sorry for the delay of my reply. I just sent a PR about this issue. Thanks. > Add LDAModel.describeTopics() in Python > --- > > Key: SPARK-8467 > URL: https://issues.apache.org/jira/browse/SPARK-8467 > Project: Spark > Issue Type: New Feature > Components: MLlib, PySpark >Reporter: Yu Ishikawa > > Add LDAModel. describeTopics() in Python. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10260) Add @Since annotation to ml.clustering
[ https://issues.apache.org/jira/browse/SPARK-10260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14712682#comment-14712682 ] Yu Ishikawa commented on SPARK-10260: - I'll work on this issue. Add @Since annotation to ml.clustering -- Key: SPARK-10260 URL: https://issues.apache.org/jira/browse/SPARK-10260 Project: Spark Issue Type: Sub-task Components: Documentation, ML Reporter: Xiangrui Meng Priority: Minor Labels: starter -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10214) Improve SparkR Column, DataFrame API docs
[ https://issues.apache.org/jira/browse/SPARK-10214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710528#comment-14710528 ] Yu Ishikawa commented on SPARK-10214: - [~shivaram] Apart from this issue, how come aren't RDD functions written as roxygen2 documentations? That is, there are no Rd files for them since the beginning of lines is {{#}}, not {{#'}}. https://github.com/apache/spark/blob/master/R/pkg/R/RDD.R#L114 Improve SparkR Column, DataFrame API docs - Key: SPARK-10214 URL: https://issues.apache.org/jira/browse/SPARK-10214 Project: Spark Issue Type: Documentation Components: SparkR Reporter: Shivaram Venkataraman Right now the docs for functions like `agg` and `filter` have duplicate entries like `agg-method` and `filter-method` etc. We should use the `name` Rd tag and remove these duplicates. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10214) Improve SparkR Column, DataFrame API docs
[ https://issues.apache.org/jira/browse/SPARK-10214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14710578#comment-14710578 ] Yu Ishikawa commented on SPARK-10214: - I understand. Thanks! Improve SparkR Column, DataFrame API docs - Key: SPARK-10214 URL: https://issues.apache.org/jira/browse/SPARK-10214 Project: Spark Issue Type: Documentation Components: SparkR Reporter: Shivaram Venkataraman Right now the docs for functions like `agg` and `filter` have duplicate entries like `agg-method` and `filter-method` etc. We should use the `name` Rd tag and remove these duplicates. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10118) Improve SparkR API docs for 1.5 release
[ https://issues.apache.org/jira/browse/SPARK-10118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14708645#comment-14708645 ] Yu Ishikawa commented on SPARK-10118: - [~shivaram] sure. I'll send a PR about that later. Improve SparkR API docs for 1.5 release --- Key: SPARK-10118 URL: https://issues.apache.org/jira/browse/SPARK-10118 Project: Spark Issue Type: Documentation Components: Documentation, SparkR Reporter: Shivaram Venkataraman This includes checking if the new DataFrame functions expression show up appropriately in the roxygen docs -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-10118) Improve SparkR API docs for 1.5 release
[ https://issues.apache.org/jira/browse/SPARK-10118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707631#comment-14707631 ] Yu Ishikawa edited comment on SPARK-10118 at 8/21/15 11:22 PM: --- [~shivaram] It seems that the Rd files of dplr and plyr were split by each function. And I asked a creater of roxygen2 if we should split functins.Rd, he answerd that would improve readability!. And I think {{@family}} would be useful. h3. Suggestion - Splits functions.Rd into each functions' Rd file - Uses {{@family}} to relate the functions on API docs. It would be better to follow Scala {{@group}} For example, {{add_months}} is documented: {noformat} # generic.R #' @rdname add_months #' @export setGeneric(add_months, function(y, x) { standardGeneric(add_months) }) # functions.R #' add_months #' #' Returns the date that is numMonths after startDate. #' #' @family datetime_funcs #' @rdname add_months #' @export setMethod(add_months, signature(y = Column, x = numeric), function(y, x) { jc - callJStatic(org.apache.spark.sql.functions, add_months, y@jc, as.integer(x)) column(jc) }) {noformat} h3. The Rd files of dplr and plyr - https://github.com/hadley/dplyr/tree/master/man - https://github.com/hadley/plyr/tree/master/man h3. Reference of {{@family}} {quote} If you have a family of related functions where every function should link to every other function in the family, use @family. The value of @family should be plural. {quote} http://r-pkgs.had.co.nz/man.html was (Author: yuu.ishik...@gmail.com): [~shivaram] It seems that the Rd files of dplr and plyr were split by each function. And I asked a creater of roxygen2 if we should split functins.Rd, he answerd that would improve readability!. And I think {{@family}} would be useful. h3. Suggestion - Splits functions.Rd into each functions' Rd file - Uses {{@family}} to relate the functions on API docs. It would be better to follow Scala {{@group}} For example, {{add_months}} is documented: {noformat} # generic.R #' @rdname add_months #' @export setGeneric(add_months, function(y, x) { standardGeneric(add_months) }) # functions.R #' add_months #' #' Returns the date that is numMonths after startDate. #' #' @family datetime_funcs #' @rdname add_months #' @export setMethod(add_months, signature(y = Column, x = numeric), function(y, x) { jc - callJStatic(org.apache.spark.sql.functions, add_months, y@jc, as.integer(x)) column(jc) }) {noformat} h3. The Rd files of dplr and plyr - https://github.com/hadley/dplyr/tree/master/R - https://github.com/hadley/plyr/tree/master/man h3. Reference of {{@family}} {quote} If you have a family of related functions where every function should link to every other function in the family, use @family. The value of @family should be plural. {quote} http://r-pkgs.had.co.nz/man.html Improve SparkR API docs for 1.5 release --- Key: SPARK-10118 URL: https://issues.apache.org/jira/browse/SPARK-10118 Project: Spark Issue Type: Documentation Components: Documentation, SparkR Reporter: Shivaram Venkataraman This includes checking if the new DataFrame functions expression show up appropriately in the roxygen docs -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10118) Improve SparkR API docs for 1.5 release
[ https://issues.apache.org/jira/browse/SPARK-10118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14707631#comment-14707631 ] Yu Ishikawa commented on SPARK-10118: - [~shivaram] It seems that the Rd files of dplr and plyr were split by each function. And I asked a creater of roxygen2 if we should split functins.Rd, he answerd that would improve readability!. And I think {{@family}} would be useful. h3. Suggestion - Splits functions.Rd into each functions' Rd file - Uses {{@family}} to relate the functions on API docs. It would be better to follow Scala {{@group}} For example, {{add_months}} is documented: {noformat} # generic.R #' @rdname add_months #' @export setGeneric(add_months, function(y, x) { standardGeneric(add_months) }) # functions.R #' add_months #' #' Returns the date that is numMonths after startDate. #' #' @family datetime_funcs #' @rdname add_months #' @export setMethod(add_months, signature(y = Column, x = numeric), function(y, x) { jc - callJStatic(org.apache.spark.sql.functions, add_months, y@jc, as.integer(x)) column(jc) }) {noformat} h3. The Rd files of dplr and plyr - https://github.com/hadley/dplyr/tree/master/R - https://github.com/hadley/plyr/tree/master/man h3. Reference of {{@family}} {quote} If you have a family of related functions where every function should link to every other function in the family, use @family. The value of @family should be plural. {quote} http://r-pkgs.had.co.nz/man.html Improve SparkR API docs for 1.5 release --- Key: SPARK-10118 URL: https://issues.apache.org/jira/browse/SPARK-10118 Project: Spark Issue Type: Documentation Components: Documentation, SparkR Reporter: Shivaram Venkataraman This includes checking if the new DataFrame functions expression show up appropriately in the roxygen docs -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9427) Add expression functions in SparkR
[ https://issues.apache.org/jira/browse/SPARK-9427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702552#comment-14702552 ] Yu Ishikawa commented on SPARK-9427: Alright. Add expression functions in SparkR -- Key: SPARK-9427 URL: https://issues.apache.org/jira/browse/SPARK-9427 Project: Spark Issue Type: New Feature Components: SparkR Reporter: Yu Ishikawa The list of functions to add is based on SQL's functions. And it would be better to add them in one shot PR. https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-6813) SparkR style guide
[ https://issues.apache.org/jira/browse/SPARK-6813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702556#comment-14702556 ] Yu Ishikawa commented on SPARK-6813: We did it. I appreciate your support. SparkR style guide -- Key: SPARK-6813 URL: https://issues.apache.org/jira/browse/SPARK-6813 Project: Spark Issue Type: New Feature Components: SparkR Reporter: Shivaram Venkataraman Assignee: Yu Ishikawa Fix For: 1.5.0 We should develop a SparkR style guide document based on the some of the guidelines we use and some of the best practices in R. Some examples of R style guide are: http://r-pkgs.had.co.nz/r.html#style http://google-styleguide.googlecode.com/svn/trunk/google-r-style.html A related issue is to work on a automatic style checking tool. https://github.com/jimhester/lintr seems promising We could have a R style guide based on the one from google [1], and adjust some of them with the conversation in Spark: 1. Line Length: maximum 100 characters 2. no limit on function name (API should be similar as in other languages) 3. Allow S4 objects/methods -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9427) Add expression functions in SparkR
[ https://issues.apache.org/jira/browse/SPARK-9427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14702563#comment-14702563 ] Yu Ishikawa commented on SPARK-9427: I see. Thank you for letting me know. Add expression functions in SparkR -- Key: SPARK-9427 URL: https://issues.apache.org/jira/browse/SPARK-9427 Project: Spark Issue Type: New Feature Components: SparkR Reporter: Yu Ishikawa The list of functions to add is based on SQL's functions. And it would be better to add them in one shot PR. https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10106) Add `ifelse` Column function to SparkR
Yu Ishikawa created SPARK-10106: --- Summary: Add `ifelse` Column function to SparkR Key: SPARK-10106 URL: https://issues.apache.org/jira/browse/SPARK-10106 Project: Spark Issue Type: New Feature Components: SparkR Reporter: Yu Ishikawa Add a column function on a DataFrame like `ifelse` in R to SparkR. I guess we could implement it with a combination with {{when}} and {{otherwise}}. h3. Example If {{df$x 0}} is TRUE, then return 0, else return 1. {noformat} ifelse(df$x 0, 0, 1) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10106) Add `ifelse` Column function to SparkR
[ https://issues.apache.org/jira/browse/SPARK-10106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Ishikawa updated SPARK-10106: Description: Add a column function on a DataFrame like `ifelse` in R to SparkR. I guess we could implement it with a combination with {{when}} and {{otherwise}}. h3. Example If {{df$x 0}} is TRUE, then return 0, otherwise return 1. {noformat} ifelse(df$x 0, 0, 1) {noformat} was: Add a column function on a DataFrame like `ifelse` in R to SparkR. I guess we could implement it with a combination with {{when}} and {{otherwise}}. h3. Example If {{df$x 0}} is TRUE, then return 0, else return 1. {noformat} ifelse(df$x 0, 0, 1) {noformat} Add `ifelse` Column function to SparkR -- Key: SPARK-10106 URL: https://issues.apache.org/jira/browse/SPARK-10106 Project: Spark Issue Type: New Feature Components: SparkR Reporter: Yu Ishikawa Add a column function on a DataFrame like `ifelse` in R to SparkR. I guess we could implement it with a combination with {{when}} and {{otherwise}}. h3. Example If {{df$x 0}} is TRUE, then return 0, otherwise return 1. {noformat} ifelse(df$x 0, 0, 1) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10079) Make `column` and `col` functions be S4 functions
Yu Ishikawa created SPARK-10079: --- Summary: Make `column` and `col` functions be S4 functions Key: SPARK-10079 URL: https://issues.apache.org/jira/browse/SPARK-10079 Project: Spark Issue Type: Sub-task Components: SparkR Reporter: Yu Ishikawa {{column}} and {{col}} function at {{R/pkg/R/Column.R}} are currently defined as S3 functions. I think it would be better to define them as S4 functions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9972) Add `struct`, `encode` and `decode` function in SparkR
[ https://issues.apache.org/jira/browse/SPARK-9972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699237#comment-14699237 ] Yu Ishikawa commented on SPARK-9972: This is a quick note to tell the reason. When I tried to implement {{sort_array}}, I got the error as follows. I haven't inspected it, but the cause seems to be at {{collect}}. I'll comment about that in detail later. {noformat} 1. Error: sort_array on a DataFrame cannot coerce class jobj to a data.frame 1: withCallingHandlers(eval(code, new_test_environment), error = capture_calls, message = function(c) invokeRestart(muffleMessage), warning = function(c) invokeRestart(muffleWarning)) 2: eval(code, new_test_environment) 3: eval(expr, envir, enclos) 4: expect_equal(collect(select(df, sort_array(df$a)))[1, 1], c(1, 2, 3)) at test_sparkSQL.R:787 5: expect_that(object, equals(expected, label = expected.label, ...), info = info, label = label) 6: condition(object) 7: compare(expected, actual, ...) 8: compare.numeric(expected, actual, ...) 9: all.equal(x, y, ...) 10: all.equal.numeric(x, y, ...) 11: attr.all.equal(target, current, tolerance = tolerance, scale = scale, ...) 12: mode(current) 13: collect(select(df, sort_array(df$a))) 14: collect(select(df, sort_array(df$a))) 15: .local(x, ...) 16: do.call(cbind.data.frame, list(cols, stringsAsFactors = stringsAsFactors)) 17: (function (..., deparse.level = 1) data.frame(..., check.names = FALSE))(structure(list(`sort_array(a,true)` = list( environment, NA, NA)), .Names = sort_array(a,true)), stringsAsFactors = FALSE) 18: data.frame(..., check.names = FALSE) 19: as.data.frame(x[[i]], optional = TRUE, stringsAsFactors = stringsAsFactors) 20: as.data.frame.list(x[[i]], optional = TRUE, stringsAsFactors = stringsAsFactors) 21: eval(as.call(c(expression(data.frame), x, check.names = !optional, stringsAsFactors = stringsAsFactors))) 22: eval(expr, envir, enclos) 23: data.frame(`sort_array(a,true)` = list(environment, NA, NA), check.names = FALSE, stringsAsFactors = FALSE) 24: as.data.frame(x[[i]], optional = TRUE, stringsAsFactors = stringsAsFactors) 25: as.data.frame.list(x[[i]], optional = TRUE, stringsAsFactors = stringsAsFactors) 26: eval(as.call(c(expression(data.frame), x, check.names = !optional, stringsAsFactors = stringsAsFactors))) 27: eval(expr, envir, enclos) 28: data.frame(environment, NA, NA, check.names = FALSE, stringsAsFactors = FALSE) 29: as.data.frame(x[[i]], optional = TRUE) 30: as.data.frame.default(x[[i]], optional = TRUE) 31: stop(gettextf(cannot coerce class \%s\ to a data.frame, deparse(class(x))), domain = NA) 32: .handleSimpleError(function (e) { e$calls - head(sys.calls()[-seq_len(frame + 7)], -2) signalCondition(e) }, cannot coerce class \\jobj\\ to a data.frame, quote(as.data.frame.default(x[[i]], optional = TRUE))) {noformat} Add `struct`, `encode` and `decode` function in SparkR -- Key: SPARK-9972 URL: https://issues.apache.org/jira/browse/SPARK-9972 Project: Spark Issue Type: Sub-task Components: SparkR Reporter: Yu Ishikawa Support {{struct}} function on a DataFrame in SparkR. However, I think we need to improve {{collect}} function in SparkR in order to implement {{struct}} function. - struct - encode - decode - array_contains - sort_array -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10075) Add `when` expressino function in SparkR
Yu Ishikawa created SPARK-10075: --- Summary: Add `when` expressino function in SparkR Key: SPARK-10075 URL: https://issues.apache.org/jira/browse/SPARK-10075 Project: Spark Issue Type: Sub-task Components: SparkR Reporter: Yu Ishikawa Add {{when}} function into SparkR. Before this issue, we need to implement {{when}}, {{otherwise}} and so on as {{Column}} methods. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10079) Make `column` and `col` functions be S4 functions
[ https://issues.apache.org/jira/browse/SPARK-10079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700721#comment-14700721 ] Yu Ishikawa commented on SPARK-10079: - I see. I misunderstood them in SparkR since {{column}} and {{col}} in Scala are {{normal_funcs}}. If we should keep them private, I'll close this issue after [~davies]'s comment. Thanks! Make `column` and `col` functions be S4 functions - Key: SPARK-10079 URL: https://issues.apache.org/jira/browse/SPARK-10079 Project: Spark Issue Type: Sub-task Components: SparkR Reporter: Yu Ishikawa {{column}} and {{col}} function at {{R/pkg/R/Column.R}} are currently defined as S3 functions. I think it would be better to define them as S4 functions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9856) Add expression functions into SparkR whose params are complicated
[ https://issues.apache.org/jira/browse/SPARK-9856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699124#comment-14699124 ] Yu Ishikawa commented on SPARK-9856: [~sunrui] Sorry, I'm working on this issue. I'll send a PR about this issue soon. Add expression functions into SparkR whose params are complicated - Key: SPARK-9856 URL: https://issues.apache.org/jira/browse/SPARK-9856 Project: Spark Issue Type: Sub-task Components: SparkR Reporter: Yu Ishikawa Add expression functions whose parameters are a little complicated, like {{regexp_extract(e: Column, exp: String, groupIdx: Int)}} and {{regexp_replace(e: Column, pattern: String, replacement: String)}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10043) Add window functions into SparkR
[ https://issues.apache.org/jira/browse/SPARK-10043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699058#comment-14699058 ] Yu Ishikawa commented on SPARK-10043: - [~shivaram] At least, {{lead}} function doesn't work. I haven't checked all of them yet. I'' check them and get back to you. Add window functions into SparkR Key: SPARK-10043 URL: https://issues.apache.org/jira/browse/SPARK-10043 Project: Spark Issue Type: Sub-task Components: SparkR Reporter: Yu Ishikawa Add window functions as follows in SparkR. I think we should improve {{collect}} function in SparkR. - lead - cumuDist - denseRank - lag - ntile - percentRank - rank - rowNumber -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9427) Add expression functions in SparkR
[ https://issues.apache.org/jira/browse/SPARK-9427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14699072#comment-14699072 ] Yu Ishikawa commented on SPARK-9427: [~shivaram] and [~davies] How do we convert R {{integer}} type to Scala {{Long}} type? I have trouble with implementing {{rand(seed: Long)}} function in SparkR. R {{integer}} type is recognized as Scala {{Int}} and R {{numeric}} type is recognized as Scala {{Double}} type. So I wonder how I should deal with 64 bit integer on R. I think we should add {{rand(seed: Int)}} into spark.sql on Scala. What do you think? Plus, I guess PySpark {{rand}} doesn't work on Python 2.x on the same reason. Because {{int}} on Python 2.x is recognized as Scala {{Integer}} type. Add expression functions in SparkR -- Key: SPARK-9427 URL: https://issues.apache.org/jira/browse/SPARK-9427 Project: Spark Issue Type: New Feature Components: SparkR Reporter: Yu Ishikawa The list of functions to add is based on SQL's functions. And it would be better to add them in one shot PR. https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10043) Add window functions into SparkR
Yu Ishikawa created SPARK-10043: --- Summary: Add window functions into SparkR Key: SPARK-10043 URL: https://issues.apache.org/jira/browse/SPARK-10043 Project: Spark Issue Type: Sub-task Components: SparkR Reporter: Yu Ishikawa Add window functions as follows in SparkR. I think we should improve {{collect}} function in SparkR. - lead - cumuDist - denseRank - lag - ntile - percentRank - rank - rowNumber -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10043) Add window functions into SparkR
[ https://issues.apache.org/jira/browse/SPARK-10043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14698994#comment-14698994 ] Yu Ishikawa commented on SPARK-10043: - As far as I know, there is no unit testing of the window function in Scala / Python. At least, we should add unit testing in Python. Add window functions into SparkR Key: SPARK-10043 URL: https://issues.apache.org/jira/browse/SPARK-10043 Project: Spark Issue Type: Sub-task Components: SparkR Reporter: Yu Ishikawa Add window functions as follows in SparkR. I think we should improve {{collect}} function in SparkR. - lead - cumuDist - denseRank - lag - ntile - percentRank - rank - rowNumber -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9871) Add expression functions into SparkR which have a variable parameter
[ https://issues.apache.org/jira/browse/SPARK-9871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696706#comment-14696706 ] Yu Ishikawa commented on SPARK-9871: I think it would be better to deal with {{struct}} on another issue. Since {{collect}} doesn't work with a DataFrame which has a column of Struct type. So we need to improve {{collect}} method. When I tried to implement {{struct}} function, a strict column converted by {{dfToCols}} consists of {{jobj}} {noformat} List of 1 $ structed:List of 2 ..$ :Class 'jobj' environment: 0x7fd46efe4e68 ..$ :Class 'jobj' environment: 0x7fd46efee078 {noformat} Add expression functions into SparkR which have a variable parameter Key: SPARK-9871 URL: https://issues.apache.org/jira/browse/SPARK-9871 Project: Spark Issue Type: Sub-task Components: SparkR Reporter: Yu Ishikawa Add expression functions into SparkR which has a variable parameter, like {{concat}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9972) Add `struct` function in SparkR
Yu Ishikawa created SPARK-9972: -- Summary: Add `struct` function in SparkR Key: SPARK-9972 URL: https://issues.apache.org/jira/browse/SPARK-9972 Project: Spark Issue Type: Sub-task Components: SparkR Reporter: Yu Ishikawa Support {{struct}} function on a DataFrame in SparkR. However, I think we need to improve {{collect}} function in SparkR in order to implement {{struct}} function. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9972) Add `struct` function in SparkR
[ https://issues.apache.org/jira/browse/SPARK-9972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Ishikawa updated SPARK-9972: --- Target Version/s: 1.6.0 (was: 1.5.0) Add `struct` function in SparkR --- Key: SPARK-9972 URL: https://issues.apache.org/jira/browse/SPARK-9972 Project: Spark Issue Type: Sub-task Components: SparkR Reporter: Yu Ishikawa Support {{struct}} function on a DataFrame in SparkR. However, I think we need to improve {{collect}} function in SparkR in order to implement {{struct}} function. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9972) Add `struct`, `encode` and `decode` function in SparkR
[ https://issues.apache.org/jira/browse/SPARK-9972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Ishikawa updated SPARK-9972: --- Summary: Add `struct`, `encode` and `decode` function in SparkR (was: Add `struct` function in SparkR) Add `struct`, `encode` and `decode` function in SparkR -- Key: SPARK-9972 URL: https://issues.apache.org/jira/browse/SPARK-9972 Project: Spark Issue Type: Sub-task Components: SparkR Reporter: Yu Ishikawa Support {{struct}} function on a DataFrame in SparkR. However, I think we need to improve {{collect}} function in SparkR in order to implement {{struct}} function. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9972) Add `struct`, `encode` and `decode` function in SparkR
[ https://issues.apache.org/jira/browse/SPARK-9972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Ishikawa updated SPARK-9972: --- Description: Support {{struct}} function on a DataFrame in SparkR. However, I think we need to improve {{collect}} function in SparkR in order to implement {{struct}} function. - struct - encode - decode - array_contains was:Support {{struct}} function on a DataFrame in SparkR. However, I think we need to improve {{collect}} function in SparkR in order to implement {{struct}} function. Add `struct`, `encode` and `decode` function in SparkR -- Key: SPARK-9972 URL: https://issues.apache.org/jira/browse/SPARK-9972 Project: Spark Issue Type: Sub-task Components: SparkR Reporter: Yu Ishikawa Support {{struct}} function on a DataFrame in SparkR. However, I think we need to improve {{collect}} function in SparkR in order to implement {{struct}} function. - struct - encode - decode - array_contains -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9972) Add `struct`, `encode` and `decode` function in SparkR
[ https://issues.apache.org/jira/browse/SPARK-9972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Ishikawa updated SPARK-9972: --- Description: Support {{struct}} function on a DataFrame in SparkR. However, I think we need to improve {{collect}} function in SparkR in order to implement {{struct}} function. - struct - encode - decode - array_contains - sort_array was: Support {{struct}} function on a DataFrame in SparkR. However, I think we need to improve {{collect}} function in SparkR in order to implement {{struct}} function. - struct - encode - decode - array_contains Add `struct`, `encode` and `decode` function in SparkR -- Key: SPARK-9972 URL: https://issues.apache.org/jira/browse/SPARK-9972 Project: Spark Issue Type: Sub-task Components: SparkR Reporter: Yu Ishikawa Support {{struct}} function on a DataFrame in SparkR. However, I think we need to improve {{collect}} function in SparkR in order to implement {{struct}} function. - struct - encode - decode - array_contains - sort_array -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10007) Update `NAMESPACE` file in SparkR for simple parameters functions
Yu Ishikawa created SPARK-10007: --- Summary: Update `NAMESPACE` file in SparkR for simple parameters functions Key: SPARK-10007 URL: https://issues.apache.org/jira/browse/SPARK-10007 Project: Spark Issue Type: Sub-task Components: SparkR Reporter: Yu Ishikawa I appreciate that I forgot to update {{NAMESPACE}} file for the simple parameters functions, such as {{ascii}}, {{base64}} and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8240) string function: concat
[ https://issues.apache.org/jira/browse/SPARK-8240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696456#comment-14696456 ] Yu Ishikawa commented on SPARK-8240: [~rxin] I think it would be more natural to support arguments which consists of mixed {{Column}} and {{String}}. What do you think? h4. Example. {noformat} concat(colA, , , colB, , , colC) {noformat} string function: concat --- Key: SPARK-8240 URL: https://issues.apache.org/jira/browse/SPARK-8240 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Reynold Xin Fix For: 1.5.0 concat(string|binary A, string|binary B...): string / binary Returns the string or bytes resulting from concatenating the strings or bytes passed in as parameters in order. For example, concat('foo', 'bar') results in 'foobar'. Note that this function can take any number of input strings. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8240) string function: concat
[ https://issues.apache.org/jira/browse/SPARK-8240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696477#comment-14696477 ] Yu Ishikawa commented on SPARK-8240: I think so. That is probably hard. However, from the user's point of view, it is an easy way to define it without {{lit}}. string function: concat --- Key: SPARK-8240 URL: https://issues.apache.org/jira/browse/SPARK-8240 Project: Spark Issue Type: Sub-task Components: SQL Reporter: Reynold Xin Assignee: Reynold Xin Fix For: 1.5.0 concat(string|binary A, string|binary B...): string / binary Returns the string or bytes resulting from concatenating the strings or bytes passed in as parameters in order. For example, concat('foo', 'bar') results in 'foobar'. Note that this function can take any number of input strings. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9871) Add expression functions into SparkR which have a variable parameter
[ https://issues.apache.org/jira/browse/SPARK-9871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Ishikawa updated SPARK-9871: --- Summary: Add expression functions into SparkR which have a variable parameter (was: Add expression functions into SparkR which has a variable parameter) Add expression functions into SparkR which have a variable parameter Key: SPARK-9871 URL: https://issues.apache.org/jira/browse/SPARK-9871 Project: Spark Issue Type: Sub-task Components: SparkR Reporter: Yu Ishikawa Add expression functions into SparkR which has a variable parameter, like {{concat}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9871) Add expression functions into SparkR which has a variable parameter
Yu Ishikawa created SPARK-9871: -- Summary: Add expression functions into SparkR which has a variable parameter Key: SPARK-9871 URL: https://issues.apache.org/jira/browse/SPARK-9871 Project: Spark Issue Type: Sub-task Components: SparkR Reporter: Yu Ishikawa Add expression functions into SparkR which has a variable parameter, like {{concat}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9427) Add expression functions in SparkR
[ https://issues.apache.org/jira/browse/SPARK-9427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692600#comment-14692600 ] Yu Ishikawa commented on SPARK-9427: [~shivaram] After all, I'd like to split this issue to a few sub-issues. Since it is quite difficult to add the listed expressions at once. And since it is a little hard to review a PR for this issue. I think we could classify them to at least three types in SparkR. What do you think? 1. Add expressions whose parameter are only {{(Column)}} or {{(Column, Column)}}, like {{md5(e: Column)}} 2. Add expressions whose parameter are a little complicated, like {{conv(num: Column, fromBase: Int, toBase: Int)}} 3. Add expressions which are conflicted with the already existing generic, like {{coalesce(e: Column*)}} {{1}} is not a difficult task, extracting method definitions from Scala code. And I think we rarely need to consider the confliction with current SparkR code. However, {{2}} and {{3}} are a little hard because of the complexityomplexity. For example, in {{3}}, if we must modify the existing R's generic due to new expressions, we should check whether the modification affects the existing code or not. Add expression functions in SparkR -- Key: SPARK-9427 URL: https://issues.apache.org/jira/browse/SPARK-9427 Project: Spark Issue Type: New Feature Components: SparkR Reporter: Yu Ishikawa The list of functions to add is based on SQL's functions. And it would be better to add them in one shot PR. https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9855) Add expression functions into SparkR whose params are simple
Yu Ishikawa created SPARK-9855: -- Summary: Add expression functions into SparkR whose params are simple Key: SPARK-9855 URL: https://issues.apache.org/jira/browse/SPARK-9855 Project: Spark Issue Type: Sub-task Components: SparkR Reporter: Yu Ishikawa Add expression functions whose parameters are only {{(Column)}} or {{(Column, Column)}}, like {{md5}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9857) Add expression functions into SparkR which conflict with the existing R's generic
Yu Ishikawa created SPARK-9857: -- Summary: Add expression functions into SparkR which conflict with the existing R's generic Key: SPARK-9857 URL: https://issues.apache.org/jira/browse/SPARK-9857 Project: Spark Issue Type: Sub-task Components: SparkR Reporter: Yu Ishikawa Add expression functions into SparkR which conflict with the existing R's generic, like {{coalesce(e: Column*)}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9427) Add expression functions in SparkR
[ https://issues.apache.org/jira/browse/SPARK-9427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692777#comment-14692777 ] Yu Ishikawa commented on SPARK-9427: [~shivaram] I don't figure out the number of each type. However, I estimated them as folows. Please be careful that it includes the functions which have been added into SparkR. 1 = 50 2 and 3 = 51 Add expression functions in SparkR -- Key: SPARK-9427 URL: https://issues.apache.org/jira/browse/SPARK-9427 Project: Spark Issue Type: New Feature Components: SparkR Reporter: Yu Ishikawa The list of functions to add is based on SQL's functions. And it would be better to add them in one shot PR. https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-9427) Add expression functions in SparkR
[ https://issues.apache.org/jira/browse/SPARK-9427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692777#comment-14692777 ] Yu Ishikawa edited comment on SPARK-9427 at 8/12/15 2:02 AM: - [~shivaram] I don't figure out the number of each type. However, I estimated them as follows. Please be careful that these include the functions which have been added into SparkR. 1 = 50 2 and 3 = 51 was (Author: yuu.ishik...@gmail.com): [~shivaram] I don't figure out the number of each type. However, I estimated them as folows. Please be careful that it includes the functions which have been added into SparkR. 1 = 50 2 and 3 = 51 Add expression functions in SparkR -- Key: SPARK-9427 URL: https://issues.apache.org/jira/browse/SPARK-9427 Project: Spark Issue Type: New Feature Components: SparkR Reporter: Yu Ishikawa The list of functions to add is based on SQL's functions. And it would be better to add them in one shot PR. https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9427) Add expression functions in SparkR
[ https://issues.apache.org/jira/browse/SPARK-9427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14692607#comment-14692607 ] Yu Ishikawa commented on SPARK-9427: h3. Memo These are the expressions which we should add. (Including existing expressions) I extracted them from Scala's {{functions.scala}} with {{grep}}. {noformat} def abs(e: Column): Column def acos(columnName: String): Column def acos(e: Column): Column def add_months(startDate: Column, numMonths: Int): Column def approxCountDistinct(columnName: String): Column def approxCountDistinct(columnName: String, rsd: Double): Column def approxCountDistinct(e: Column): Column def approxCountDistinct(e: Column, rsd: Double): Column def array(colName: String, colNames: String*): Column def array(cols: Column*): Column def array_contains(column: Column, value: Any): Column def asc(columnName: String): Column def ascii(e: Column): Column def asin(columnName: String): Column def asin(e: Column): Column def atan(columnName: String): Column def atan(e: Column): Column def atan2(l: Column, r: Column): Column def atan2(l: Column, r: Double): Column def atan2(l: Column, rightName: String): Column def atan2(l: Double, r: Column): Column def atan2(l: Double, rightName: String): Column def atan2(leftName: String, r: Column): Column def atan2(leftName: String, r: Double): Column def atan2(leftName: String, rightName: String): Column def avg(columnName: String): Column def avg(e: Column): Column def base64(e: Column): Column def bin(columnName: String): Column def bin(e: Column): Column def bitwiseNOT(e: Column): Column def cbrt(columnName: String): Column def cbrt(e: Column): Column def ceil(columnName: String): Column def ceil(e: Column): Column def coalesce(e: Column*): Column def concat(exprs: Column*): Column def concat_ws(sep: String, exprs: Column*): Column def conv(num: Column, fromBase: Int, toBase: Int): Column def cos(columnName: String): Column def cos(e: Column): Column def cosh(columnName: String): Column def cosh(e: Column): Column def count(columnName: String): Column def count(e: Column): Column def countDistinct(columnName: String, columnNames: String*): Column def countDistinct(expr: Column, exprs: Column*): Column def crc32(e: Column): Column def cumeDist(): Column def current_date(): Column def current_timestamp(): Column def date_add(start: Column, days: Int): Column def date_format(dateExpr: Column, format: String): Column def date_sub(start: Column, days: Int): Column def datediff(end: Column, start: Column): Column def dayofmonth(e: Column): Column def dayofyear(e: Column): Column def decode(value: Column, charset: String): Column def denseRank(): Column def desc(columnName: String): Column def encode(value: Column, charset: String): Column def exp(columnName: String): Column def exp(e: Column): Column def explode(e: Column): Column def expm1(columnName: String): Column def expm1(e: Column): Column def expr(expr: String): Column def factorial(e: Column): Column def first(columnName: String): Column def first(e: Column): Column def floor(columnName: String): Column def floor(e: Column): Column def format_number(x: Column, d: Int): Column def format_string(format: String, arguments: Column*): Column def from_unixtime(ut: Column): Column def from_unixtime(ut: Column, f: String): Column def from_utc_timestamp(ts: Column, tz: String): Column def greatest(columnName: String, columnNames: String*): Column def greatest(exprs: Column*): Column def hex(column: Column): Column def hour(e: Column): Column def hypot(l: Column, r: Column): Column def hypot(l: Column, r: Double): Column def hypot(l: Column, rightName: String): Column def hypot(l: Double, r: Column): Column def hypot(l: Double, rightName: String): Column def hypot(leftName: String, r: Column): Column def hypot(leftName: String, r: Double): Column def hypot(leftName: String, rightName: String): Column def initcap(e: Column): Column def inputFileName(): Column def instr(str: Column, substring: String): Column def isNaN(e: Column): Column def lag(columnName: String, offset: Int): Column def lag(columnName: String, offset: Int, defaultValue: Any): Column def lag(e: Column, offset: Int): Column def lag(e: Column, offset: Int, defaultValue: Any): Column def last(columnName: String): Column def last(e: Column): Column def last_day(e: Column): Column def lead(columnName: String, offset: Int): Column def lead(columnName: String, offset: Int, defaultValue: Any): Column def lead(e: Column, offset: Int): Column def lead(e: Column, offset: Int, defaultValue: Any): Column def least(columnName: String, columnNames: String*): Column def least(exprs: Column*): Column def length(e: Column): Column def levenshtein(l: Column, r: Column): Column def lit(literal: Any): Column def locate(substr: String, str: Column): Column def locate(substr: String, str: Column, pos: Int): Column def log(base: Double, a: Column): Column def
[jira] [Created] (SPARK-9856) Add expression functions into SparkR whose params are complicated
Yu Ishikawa created SPARK-9856: -- Summary: Add expression functions into SparkR whose params are complicated Key: SPARK-9856 URL: https://issues.apache.org/jira/browse/SPARK-9856 Project: Spark Issue Type: Sub-task Components: SparkR Reporter: Yu Ishikawa Add expression functions whose parameters are a little complicated, like {{regexp_extract(e: Column, exp: String, groupIdx: Int)}} and {{regexp_replace(e: Column, pattern: String, replacement: String)}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8505) Add settings to kick `lint-r` from `./dev/run-test.py`
[ https://issues.apache.org/jira/browse/SPARK-8505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14650233#comment-14650233 ] Yu Ishikawa commented on SPARK-8505: [~srowen] Yes, I acknowledge how it is assigned, but I thought it would be better to show my activity to the other developers. I would be careful next time. Thanks! Add settings to kick `lint-r` from `./dev/run-test.py` -- Key: SPARK-8505 URL: https://issues.apache.org/jira/browse/SPARK-8505 Project: Spark Issue Type: Sub-task Components: SparkR Reporter: Yu Ishikawa Add some settings to kick `lint-r` script from `./dev/run-test.py` -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8505) Add settings to kick `lint-r` from `./dev/run-test.py`
[ https://issues.apache.org/jira/browse/SPARK-8505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649839#comment-14649839 ] Yu Ishikawa commented on SPARK-8505: Please assign this issue to me? Add settings to kick `lint-r` from `./dev/run-test.py` -- Key: SPARK-8505 URL: https://issues.apache.org/jira/browse/SPARK-8505 Project: Spark Issue Type: Sub-task Components: SparkR Reporter: Yu Ishikawa Add some settings to kick `lint-r` script from `./dev/run-test.py` -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-9427) Add expression functions in SparkR
[ https://issues.apache.org/jira/browse/SPARK-9427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645464#comment-14645464 ] Yu Ishikawa edited comment on SPARK-9427 at 7/29/15 11:33 PM: -- I'll work on this issue. was (Author: yuu.ishik...@gmail.com): I'll work this issue. Add expression functions in SparkR -- Key: SPARK-9427 URL: https://issues.apache.org/jira/browse/SPARK-9427 Project: Spark Issue Type: Sub-task Components: SparkR Reporter: Yu Ishikawa The list of functions to add is based on SQL's functions. And it would be better to add them in one shot PR. https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-8917) Add @since tags to mllib.linalg
[ https://issues.apache.org/jira/browse/SPARK-8917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645248#comment-14645248 ] Yu Ishikawa commented on SPARK-8917: Please assign this issue to me. Add @since tags to mllib.linalg --- Key: SPARK-8917 URL: https://issues.apache.org/jira/browse/SPARK-8917 Project: Spark Issue Type: Sub-task Components: Documentation, MLlib Reporter: Xiangrui Meng Priority: Minor Labels: starter Original Estimate: 2h Remaining Estimate: 2h -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9427) Add expression functions in SparkR
Yu Ishikawa created SPARK-9427: -- Summary: Add expression functions in SparkR Key: SPARK-9427 URL: https://issues.apache.org/jira/browse/SPARK-9427 Project: Spark Issue Type: Sub-task Components: SparkR Reporter: Yu Ishikawa The list of functions to add is based on SQL's functions. And it would be better to add them in one shot PR. https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9427) Add expression functions in SparkR
[ https://issues.apache.org/jira/browse/SPARK-9427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645464#comment-14645464 ] Yu Ishikawa commented on SPARK-9427: I'll work this issue. Add expression functions in SparkR -- Key: SPARK-9427 URL: https://issues.apache.org/jira/browse/SPARK-9427 Project: Spark Issue Type: Sub-task Components: SparkR Reporter: Yu Ishikawa The list of functions to add is based on SQL's functions. And it would be better to add them in one shot PR. https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9248) Closing curly-braces should always be on their own line
[ https://issues.apache.org/jira/browse/SPARK-9248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Ishikawa updated SPARK-9248: --- Description: Closing curly-braces should always be on their own line For example, {noformat} inst/tests/test_sparkSQL.R:606:3: style: Closing curly-braces should always be on their own line, unless it's followed by an else. }, error = function(err) { ^ {noformat} was:Closing curly-braces should always be on their own line Closing curly-braces should always be on their own line --- Key: SPARK-9248 URL: https://issues.apache.org/jira/browse/SPARK-9248 Project: Spark Issue Type: Sub-task Components: SparkR Reporter: Yu Ishikawa Priority: Minor Closing curly-braces should always be on their own line For example, {noformat} inst/tests/test_sparkSQL.R:606:3: style: Closing curly-braces should always be on their own line, unless it's followed by an else. }, error = function(err) { ^ {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9248) Closing curly-braces should always be on their own line
[ https://issues.apache.org/jira/browse/SPARK-9248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14640074#comment-14640074 ] Yu Ishikawa commented on SPARK-9248: Yeah - sorry for not explaining enough. {{dev/lint-r}} doesn't catch the warnings about {{\} else \{}} now. There are a few warnings like above. Closing curly-braces should always be on their own line --- Key: SPARK-9248 URL: https://issues.apache.org/jira/browse/SPARK-9248 Project: Spark Issue Type: Sub-task Components: SparkR Reporter: Yu Ishikawa Priority: Minor Closing curly-braces should always be on their own line For example, {noformat} inst/tests/test_sparkSQL.R:606:3: style: Closing curly-braces should always be on their own line, unless it's followed by an else. }, error = function(err) { ^ {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9249) local variable assigned but may not be used
[ https://issues.apache.org/jira/browse/SPARK-9249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Ishikawa updated SPARK-9249: --- Description: local variable assigned but may not be used For example: {noformat} R/deserialize.R:105:3: warning: local variable ‘data’ assigned but may not be used data - readBin(con, raw(), as.integer(dataLen), endian = big) ^~~~ R/deserialize.R:109:3: warning: local variable ‘data’ assigned but may not be used data - readBin(con, raw(), as.integer(dataLen), endian = big) ^~~~ {noformat} was:local variable assigned but may not be used local variable assigned but may not be used --- Key: SPARK-9249 URL: https://issues.apache.org/jira/browse/SPARK-9249 Project: Spark Issue Type: Sub-task Components: SparkR Reporter: Yu Ishikawa Priority: Minor local variable assigned but may not be used For example: {noformat} R/deserialize.R:105:3: warning: local variable ‘data’ assigned but may not be used data - readBin(con, raw(), as.integer(dataLen), endian = big) ^~~~ R/deserialize.R:109:3: warning: local variable ‘data’ assigned but may not be used data - readBin(con, raw(), as.integer(dataLen), endian = big) ^~~~ {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-9249) local variable assigned but may not be used
[ https://issues.apache.org/jira/browse/SPARK-9249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639945#comment-14639945 ] Yu Ishikawa edited comment on SPARK-9249 at 7/24/15 5:22 AM: - [~chanchal.spark] Yes. I think we should remove local variables which are not used, such as below. https://github.com/apache/spark/blob/branch-1.4/R/pkg/R/deserialize.R#L104 was (Author: yuu.ishik...@gmail.com): [~chanchal.spark] Yes. I think we should remove local variables which is not used, such as below. https://github.com/apache/spark/blob/branch-1.4/R/pkg/R/deserialize.R#L104 local variable assigned but may not be used --- Key: SPARK-9249 URL: https://issues.apache.org/jira/browse/SPARK-9249 Project: Spark Issue Type: Sub-task Components: SparkR Reporter: Yu Ishikawa Priority: Minor local variable assigned but may not be used For example: {noformat} R/deserialize.R:105:3: warning: local variable ‘data’ assigned but may not be used data - readBin(con, raw(), as.integer(dataLen), endian = big) ^~~~ R/deserialize.R:109:3: warning: local variable ‘data’ assigned but may not be used data - readBin(con, raw(), as.integer(dataLen), endian = big) ^~~~ {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9249) local variable assigned but may not be used
[ https://issues.apache.org/jira/browse/SPARK-9249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639945#comment-14639945 ] Yu Ishikawa commented on SPARK-9249: [~chanchal.spark] Yes. I think we should remove local variables which is not used, such as below. https://github.com/apache/spark/blob/branch-1.4/R/pkg/R/deserialize.R#L104 local variable assigned but may not be used --- Key: SPARK-9249 URL: https://issues.apache.org/jira/browse/SPARK-9249 Project: Spark Issue Type: Sub-task Components: SparkR Reporter: Yu Ishikawa Priority: Minor local variable assigned but may not be used For example: {noformat} R/deserialize.R:105:3: warning: local variable ‘data’ assigned but may not be used data - readBin(con, raw(), as.integer(dataLen), endian = big) ^~~~ R/deserialize.R:109:3: warning: local variable ‘data’ assigned but may not be used data - readBin(con, raw(), as.integer(dataLen), endian = big) ^~~~ {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-9249) local variable assigned but may not be used
[ https://issues.apache.org/jira/browse/SPARK-9249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639955#comment-14639955 ] Yu Ishikawa commented on SPARK-9249: I'm working this issue. local variable assigned but may not be used --- Key: SPARK-9249 URL: https://issues.apache.org/jira/browse/SPARK-9249 Project: Spark Issue Type: Sub-task Components: SparkR Reporter: Yu Ishikawa Priority: Minor local variable assigned but may not be used For example: {noformat} R/deserialize.R:105:3: warning: local variable ‘data’ assigned but may not be used data - readBin(con, raw(), as.integer(dataLen), endian = big) ^~~~ R/deserialize.R:109:3: warning: local variable ‘data’ assigned but may not be used data - readBin(con, raw(), as.integer(dataLen), endian = big) ^~~~ {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9248) Closing curly-braces should always be on their own line
Yu Ishikawa created SPARK-9248: -- Summary: Closing curly-braces should always be on their own line Key: SPARK-9248 URL: https://issues.apache.org/jira/browse/SPARK-9248 Project: Spark Issue Type: Sub-task Components: SparkR Reporter: Yu Ishikawa Priority: Minor Closing curly-braces should always be on their own line -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-9249) local variable assigned but may not be used
Yu Ishikawa created SPARK-9249: -- Summary: local variable assigned but may not be used Key: SPARK-9249 URL: https://issues.apache.org/jira/browse/SPARK-9249 Project: Spark Issue Type: Sub-task Components: SparkR Reporter: Yu Ishikawa Priority: Minor local variable assigned but may not be used -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org