from:"WeichenXu123"

[GitHub] spark issue #13558: [SPARK-15820][PySpark][SQL]Add Catalog.refreshTable into...

2016-06-08 Thread WeichenXu123

Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/13558 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #13538: [MINOR] fix typo in documents

2016-06-06 Thread WeichenXu123

GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/13538 [MINOR] fix typo in documents ## What changes were proposed in this pull request? I use spell check tools checks typo in spark documents and fix them. ## How was this patch

[GitHub] spark pull request #13578: [SPARK-15837][ML][PySpark]Word2vec python add max...

2016-06-09 Thread WeichenXu123

GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/13578 [SPARK-15837][ML][PySpark]Word2vec python add maxsentence parameter ## What changes were proposed in this pull request? Word2vec python add maxsentence parameter. ## How

[GitHub] spark issue #13578: [SPARK-15837][ML][PySpark]Word2vec python add maxsentenc...

2016-06-09 Thread WeichenXu123

Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/13578 @srowen Hi srowen, I have another similar PR #13558 which past test on my machine, but the official test fail. It seems to be the test server's problem, can you help to check

[GitHub] spark issue #13544: [SPARK-15805][SQL][Documents] update sql programming gui...

2016-06-07 Thread WeichenXu123

Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/13544 @rxin a small problem: in `HiveContext` there is a method `refreshTable` for refreshing metadata of Hive table. now using new SparkSession API with hive support, the method

[GitHub] spark pull request #13558: [SPARK-15820][pyspark][SQL] update python sql int...

2016-06-08 Thread WeichenXu123

GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/13558 [SPARK-15820][pyspark][SQL] update python sql interface refreshTable ## What changes were proposed in this pull request? Add Catalog.refreshTable API into python interface for Spark

[GitHub] spark issue #13558: [SPARK-15820][PySpark][SQL]Add Catalog.refreshTable into...

2016-06-08 Thread WeichenXu123

Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/13558 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #13558: [SPARK-15820][PySpark][SQL]Add Catalog.refreshTable into...

2016-06-08 Thread WeichenXu123

Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/13558 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #13558: [SPARK-15820][PySpark][SQL]Add Catalog.refreshTable into...

2016-06-08 Thread WeichenXu123

Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/13558 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #13538: [MINOR] fix typo in documents

2016-06-07 Thread WeichenXu123

Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/13538#discussion_r66059465 --- Diff: docs/streaming-programming-guide.md --- @@ -2037,7 +2037,7 @@ and configuring them to receive different partitions of the data stream from

[GitHub] spark issue #13538: [MINOR] fix typo in documents

2016-06-07 Thread WeichenXu123

Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/13538 @srowen Yes, I check each md files, and I think it is done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #13381: [SPARK-15608][ml][doc] add_isotonic_regression_doc

2016-06-06 Thread WeichenXu123

Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/13381 @yanboliang Done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark pull request #13525: [MINOR]fix typo a -> an

2016-06-06 Thread WeichenXu123

GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/13525 [MINOR]fix typo a -> an ## What changes were proposed in this pull request? a->an similar to #13515 Use cmds like `find . -name '*.R' | xargs -i sh -c "grep -in

[GitHub] spark pull request #13544: [SPARK-15805][SQL][Documents] update sql programm...

2016-06-07 Thread WeichenXu123

GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/13544 [SPARK-15805][SQL][Documents] update sql programming guide ## What changes were proposed in this pull request? Update the whole sql programming guide doc file , including: update

[GitHub] spark issue #13544: [SPARK-15805][SQL][Documents] update sql programming gui...

2016-06-07 Thread WeichenXu123

Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/13544 @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #13558: [SPARK-15820][PySpark][SQL]Add Catalog.refreshTable into...

2016-06-12 Thread WeichenXu123

Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/13558 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #13544: [SPARK-15805][SQL][Documents] update sql programming gui...

2016-06-10 Thread WeichenXu123

Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/13544 @liancheng OK, no problem ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...

2016-06-10 Thread WeichenXu123

Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/13592#discussion_r66699400 --- Diff: docs/sql-programming-guide.md --- @@ -1607,13 +1600,13 @@ a regular multi-line JSON file will most often fail. {% highlight r

[GitHub] spark pull request #13381: [SPARK-15608][ml][examples][doc] add examples and...

2016-06-10 Thread WeichenXu123

Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/13381#discussion_r66700697 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/IsotonicRegressionExample.scala --- @@ -0,0 +1,62 @@ +/* + * Licensed

[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...

2016-06-10 Thread WeichenXu123

Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/13592#discussion_r66700281 --- Diff: docs/sql-programming-guide.md --- @@ -517,24 +517,26 @@ types such as Sequences or Arrays. This RDD can be implicitly converted to a Dat

[GitHub] spark issue #13558: [SPARK-15820][PySpark][SQL]Add Catalog.refreshTable into...

2016-06-09 Thread WeichenXu123

Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/13558 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #13381: [SPARK-15608][ml][examples][doc] add examples and...

2016-06-11 Thread WeichenXu123

Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/13381#discussion_r66715565 --- Diff: docs/ml-classification-regression.md --- @@ -685,6 +685,76 @@ The implementation matches the result from R's survival function

[GitHub] spark pull request #13381: [SPARK-15608][ml][examples][doc] add examples and...

2016-06-14 Thread WeichenXu123

Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/13381#discussion_r66961848 --- Diff: examples/src/main/python/mllib/isotonic_regression_example.py --- @@ -23,18 +23,22 @@ from pyspark import SparkContext # $example

[GitHub] spark issue #13558: [SPARK-15820][PySpark][SQL]Add Catalog.refreshTable into...

2016-06-14 Thread WeichenXu123

Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/13558 @andrewor14 It looks strange, I test on my own machine and it is all OK. If it is the test server's problem? --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #13525: [MINOR]fix typo a -> an

2016-06-06 Thread WeichenXu123

Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/13525 @srowen ok, so the comand don't work correctly --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request #13525: [MINOR]fix typo a -> an

2016-06-06 Thread WeichenXu123

Github user WeichenXu123 closed the pull request at: https://github.com/apache/spark/pull/13525 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #13558: [SPARK-15820][PySpark][SQL]Add Catalog.refreshTable into...

2016-06-08 Thread WeichenXu123

Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/13558 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request: [SPARK-15608][ml][doc] add_isotonic_regression...

2016-05-28 Thread WeichenXu123

GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/13381 [SPARK-15608][ml][doc] add_isotonic_regression_doc ## What changes were proposed in this pull request? add ml doc for ml isotonic regression add scala example for ml isotonic

[GitHub] spark pull request: [SPARK-15608][ml][doc] add_isotonic_regression...

2016-05-29 Thread WeichenXu123

Github user WeichenXu123 commented on the pull request: https://github.com/apache/spark/pull/13381#issuecomment-222353670 @holdenk Java & python example added. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: [SPARK-15533][SQL]Deprecate Dataset.explode

2016-05-25 Thread WeichenXu123

GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/13313 [SPARK-15533][SQL]Deprecate Dataset.explode ## What changes were proposed in this pull request? Deprecate Dataset.explode ## How was this patch tested? Existing

[GitHub] spark pull request: [SPARK-15533][SQL]Deprecate Dataset.explode

2016-05-25 Thread WeichenXu123

Github user WeichenXu123 closed the pull request at: https://github.com/apache/spark/pull/13313 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-15499][PySpark][Tests] Add python tests...

2016-05-27 Thread WeichenXu123

Github user WeichenXu123 commented on the pull request: https://github.com/apache/spark/pull/13275#issuecomment-222069163 @jkbradley --modules='pyspark-ml' will run a bunch of test in pyspark sub directory parallel, not single python file. and my purpose is add a way to debug

[GitHub] spark pull request #13441: [SPARK-15702][Documentation]Update document progr...

2016-06-01 Thread WeichenXu123

GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/13441 [SPARK-15702][Documentation]Update document programming-guide accumulator section ## What changes were proposed in this pull request? Update document programming-guide accumulator

[GitHub] spark pull request #12987: [spark-15212][SQL]CSV file reader when read file ...

2016-06-01 Thread WeichenXu123

Github user WeichenXu123 closed the pull request at: https://github.com/apache/spark/pull/12987 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-15670][Java API][Spark Core]label_accumulator_dep...

2016-05-31 Thread WeichenXu123

GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/13412 [SPARK-15670][Java API][Spark Core]label_accumulator_deprecate_in_java_spark_context ## What changes were proposed in this pull request? Add deprecate annotation for acumulator V1

[GitHub] spark issue #13558: [SPARK-15820][PySpark][SQL]Add Catalog.refreshTable into...

2016-06-19 Thread WeichenXu123

Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/13558 @andrewor14 The PR is OK now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #13544: [SPARK-15805][SQL][Documents] update sql programm...

2016-06-27 Thread WeichenXu123

Github user WeichenXu123 closed the pull request at: https://github.com/apache/spark/pull/13544 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request #13558: [SPARK-15820][PySpark][SQL]Add Catalog.refreshTab...

2016-06-17 Thread WeichenXu123

Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/13558#discussion_r67496513 --- Diff: python/pyspark/sql/catalog.py --- @@ -232,6 +232,11 @@ def clearCache(self): """Removes all cached tables fro

[GitHub] spark pull request: [SPARK-15203][Deploy]The spark daemon shell sc...

2016-05-19 Thread WeichenXu123

Github user WeichenXu123 commented on the pull request: https://github.com/apache/spark/pull/13172#issuecomment-220260517 @srowen Modified as you expected. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-15461][tests][pyspark]modify python tes...

2016-05-20 Thread WeichenXu123

GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/13240 [SPARK-15461][tests][pyspark]modify python test script using version default 2.7 ## What changes were proposed in this pull request? update the default python version used in pytion

[GitHub] spark pull request: [SPARK-15464][ML][MLlib][SQL][Tests] Replace S...

2016-05-22 Thread WeichenXu123

Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/13242#discussion_r64147805 --- Diff: python/pyspark/ml/clustering.py --- @@ -933,21 +933,20 @@ def getKeepLastCheckpoint(self): if __name__ == "__main__":

[GitHub] spark pull request: [SPARK-15461][tests][pyspark]modify python tes...

2016-05-22 Thread WeichenXu123

Github user WeichenXu123 closed the pull request at: https://github.com/apache/spark/pull/13240 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-15226][SQL]fix CSV file data-line with ...

2016-05-22 Thread WeichenXu123

Github user WeichenXu123 closed the pull request at: https://github.com/apache/spark/pull/13007 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-15499][PySpark][Tests] Add python tests...

2016-05-24 Thread WeichenXu123

Github user WeichenXu123 commented on the pull request: https://github.com/apache/spark/pull/13275#issuecomment-221203315 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-15499][PySpark][Tests] Add python tests...

2016-05-24 Thread WeichenXu123

GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/13275 [SPARK-15499][PySpark][Tests] Add python testsuite with remote debug and single test parameter to help developer debug code easier ## What changes were proposed in this pull request

[GitHub] spark pull request: [SPARK-15499][PySpark][Tests] Add python tests...

2016-05-24 Thread WeichenXu123

Github user WeichenXu123 commented on the pull request: https://github.com/apache/spark/pull/13275#issuecomment-221210606 @davies How do you think about it ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-15464][ML][MLlib][SQL][Tests] Replace S...

2016-05-21 Thread WeichenXu123

GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/13242 [SPARK-15464][ML][MLlib][SQL][Tests] Replace SQLContext and SparkContext with SparkSession using builder pattern in python test code ## What changes were proposed in this pull request

[GitHub] spark pull request: [SPARK-15446][build][sql] modify catalyst usin...

2016-05-20 Thread WeichenXu123

Github user WeichenXu123 closed the pull request at: https://github.com/apache/spark/pull/13224 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-15446][build][sql] modify catalyst usin...

2016-05-20 Thread WeichenXu123

GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/13224 [SPARK-15446][build][sql] modify catalyst using longValueExact not supporting java 7 ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix

[GitHub] spark pull request: [SPARK-15350][mllib]add unit test function for...

2016-05-16 Thread WeichenXu123

GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/13136 [SPARK-15350][mllib]add unit test function for LogisticRegressionWithLBFGS in JavaLogisticRegressionSuite ## What changes were proposed in this pull request? add unit test function

[GitHub] spark pull request: [SPARK-15203][Deploy]The spark daemon shell sc...

2016-05-18 Thread WeichenXu123

Github user WeichenXu123 commented on the pull request: https://github.com/apache/spark/pull/12978#issuecomment-220021066 In order to check this potential problem more carefully, We can add the following test code like this: ` echo "$newpid" > "$pid&qu

[GitHub] spark pull request: [SPARK-15203][Deploy]The spark daemon shell sc...

2016-05-18 Thread WeichenXu123

Github user WeichenXu123 commented on the pull request: https://github.com/apache/spark/pull/12978#issuecomment-220016724 @srowen According to your suggestion, I add a loop to check whether it pass STAGE-1 and launch java daemon. And I recover the check statement `! $(ps -p "$n

[GitHub] spark pull request: [SPARK-15203][Deploy]The spark daemon shell sc...

2016-05-18 Thread WeichenXu123

Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/12978#discussion_r63700865 --- Diff: sbin/spark-daemon.sh --- @@ -162,6 +162,15 @@ run_command() { esac echo "$newpid" > "$pid" +

[GitHub] spark pull request: [SPARK-15322][mllib][core][sql]update deprecat...

2016-05-18 Thread WeichenXu123

Github user WeichenXu123 commented on the pull request: https://github.com/apache/spark/pull/13112#issuecomment-219980697 @srowen updated. Seems no problem. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-15203][Deploy]The spark daemon shell sc...

2016-05-18 Thread WeichenXu123

Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/12978#discussion_r63708450 --- Diff: sbin/spark-daemon.sh --- @@ -162,6 +162,15 @@ run_command() { esac echo "$newpid" > "$pid" +

[GitHub] spark pull request: [SPARK-15203][Deploy]fix bug SPARK-15203

2016-05-18 Thread WeichenXu123

GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/13172 [SPARK-15203][Deploy]fix bug SPARK-15203 ## What changes were proposed in this pull request? fix bug SPARK-15203 ## How was this patch tested? existing test

[GitHub] spark pull request: [SPARK-15203][Deploy]The spark daemon shell sc...

2016-05-18 Thread WeichenXu123

Github user WeichenXu123 closed the pull request at: https://github.com/apache/spark/pull/12978 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request: [SPARK-15322][mllib]update deprecate accumulat...

2016-05-14 Thread WeichenXu123

GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/13112 [SPARK-15322][mllib]update deprecate accumulator usage into accumulatorV2 in mllib ## What changes were proposed in this pull request? MLlib code has two position use sc.accumulator

[GitHub] spark pull request: [SPARK-15322][mllib]update deprecate accumulat...

2016-05-15 Thread WeichenXu123

Github user WeichenXu123 commented on the pull request: https://github.com/apache/spark/pull/13112#issuecomment-219291622 @srowen I use Intellj-IDEA to search usage of deprecate SparkContext.accumulator in the whole spark project, and update the code.(except those test code

[GitHub] spark pull request: [SPARK-15203][Deploy]The spark daemon shell sc...

2016-05-10 Thread WeichenXu123

Github user WeichenXu123 commented on the pull request: https://github.com/apache/spark/pull/12978#issuecomment-218089784 @srowen At my virtual machine, After OS started and start spark daemon, the stage 1 describe above will took a long time,often exceeding 2s, I think the java

[GitHub] spark pull request: [SPARK-15226][SQL]fix CSV file data-line with ...

2016-05-10 Thread WeichenXu123

Github user WeichenXu123 commented on the pull request: https://github.com/apache/spark/pull/13007#issuecomment-218094762 @HyukjinKwon En..current cvs load code use Hadoop `LineRecordReader`, so not allow a row split into mulit-lines, so I think the code should disable csv multi

[GitHub] spark pull request: fix CSV file data-line with newline at first l...

2016-05-09 Thread WeichenXu123

GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/13007 fix CSV file data-line with newline at first line load error ## What changes were proposed in this pull request? fix CSV file data-line with newline at first line load error

[GitHub] spark pull request: [SPARK-15226][SQL]fix CSV file data-line with ...

2016-05-09 Thread WeichenXu123

Github user WeichenXu123 commented on the pull request: https://github.com/apache/spark/pull/13007#issuecomment-218037385 @HyukjinKwon I run existing test against this patch and all pass. If need I will add a new test in CSVSuit. And I think the only reason cause the bug

[GitHub] spark pull request: [SPARK-15203][Deploy]The spark daemon shell sc...

2016-05-09 Thread WeichenXu123

Github user WeichenXu123 commented on the pull request: https://github.com/apache/spark/pull/12978#issuecomment-218040874 @srowen because spark-daemon.sh using exec command to start the java daemon process. when run script, the spark-daemon.sh process will exists for a little

[GitHub] spark pull request: [SPARK-15203][Deploy]The spark daemon shell sc...

2016-05-10 Thread WeichenXu123

Github user WeichenXu123 commented on the pull request: https://github.com/apache/spark/pull/12978#issuecomment-218097327 @srowen Er...I am also a little strange while stage 1 may took a long time but it really happen several times... If there is time I will do a more detailed test

[GitHub] spark pull request: [spark-15212][SQL]CSV file reader when read fi...

2016-05-08 Thread WeichenXu123

GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/12987 [spark-15212][SQL]CSV file reader when read file with first line schema do not filter blank in schema column name ## What changes were proposed in this pull request? When load csv

[GitHub] spark pull request: [SPARK-15203][Deploy]The spark daemon shell sc...

2016-05-07 Thread WeichenXu123

GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/12978 [SPARK-15203][Deploy]The spark daemon shell script error, daemon process start successfully but script output fail message. This bug is because, sbin/spark-daemon.sh script use bin/spark

[GitHub] spark issue #14333: [SPARK-16696][ML][MLLib] destroy KMeans bcNewCenters whe...

2016-07-25 Thread WeichenXu123

Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14333 @srowen The sparkContext, by default, will running a cleaner to release not referenced RDD/broadcasts on background. But, I think, we'd better to release them by ourselves because

[GitHub] spark issue #14335: [SPARK-16697][ML][MLLib] improve LDA submitMiniBatch met...

2016-07-25 Thread WeichenXu123

Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14335 @srowen `stats.unpersist(false)` ==> `stats.unpersist()` updated. is there anything else need to update ? --- If your project is set up for it, you can reply to this email and h

[GitHub] spark issue #14333: [SPARK-16696][ML][MLLib] destroy KMeans bcNewCenters whe...

2016-07-26 Thread WeichenXu123

Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14333 @srowen I check `RDD.persist` referenced place: AFTSuvivalRegression, LinearRegression, LogisticRegression, will persist input training RDD and unpersist them when `train` return

[GitHub] spark pull request #14203: update python dataframe.drop

2016-07-14 Thread WeichenXu123

GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/14203 update python dataframe.drop ## What changes were proposed in this pull request? Make `dataframe.drop` API in python support multi-columns parameters, so that it is the same

[GitHub] spark issue #14333: [SPARK-16696][ML][MLLib] destroy KMeans bcNewCenters whe...

2016-07-25 Thread WeichenXu123

Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14333 yeah, but the `bcSyn0Global` in Word2Vec is a difference case, it looks safe there to destroy, because in each loop iteration, the RDD transform which use `bcSyn0Global` ends

[GitHub] spark pull request #14335: [SPARK-16697][ML][MLLib] improve LDA submitMiniBa...

2016-07-24 Thread WeichenXu123

Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/14335#discussion_r72003627 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -472,12 +473,13 @@ final class OnlineLDAOptimizer extends

[GitHub] spark issue #14333: [SPARK-16696][ML][MLLib] unused broadcast variables do d...

2016-07-24 Thread WeichenXu123

Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14333 @srowen The `bcNewCenters` in `KMeans` has some problem. Check the code logic in detail, we can find that in each loop, it should destroy the broadcast var `bcNewCenters` generated

[GitHub] spark pull request #14335: [SPARK-16697][ML][MLLib] improve LDA submitMiniBa...

2016-07-24 Thread WeichenXu123

Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/14335#discussion_r72003530 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -472,12 +473,13 @@ final class OnlineLDAOptimizer extends

[GitHub] spark pull request #14335: [SPARK-16697][ML][MLLib] improve LDA submitMiniBa...

2016-07-24 Thread WeichenXu123

Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/14335#discussion_r72003428 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -472,12 +473,13 @@ final class OnlineLDAOptimizer extends

[GitHub] spark issue #14333: [SPARK-16696][ML][MLLib] destroy KMeans bcNewCenters whe...

2016-07-27 Thread WeichenXu123

Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14333 @srowen yeah, the code logic here seems confusing, but I think it is right. Now I can explain it in a clear way: in essence, the logic can be expressed as following: A0->I1->

[GitHub] spark issue #14326: [SPARK-3181] [ML] Implement RobustRegression with huber ...

2016-07-24 Thread WeichenXu123

Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14326 @yanboliang I go through the code and there are several problems need to solve: The robust regression has a parameter `sigma` which must > 0, so that it is a bound optimize prob

[GitHub] spark issue #14216: [SPARK-16561][MLLib] fix multivarOnlineSummary min/max b...

2016-07-22 Thread WeichenXu123

Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14216 @srowen Oh, I miss your comment about loop brace, now it added, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #14265: [PySpark] add picklable SparseMatrix in pyspark.ml.commo...

2016-07-21 Thread WeichenXu123

Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14265 cc @jkbradley Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #14216: [SPARK-16561][MLLib] fix multivarOnlineSummary min/max b...

2016-07-21 Thread WeichenXu123

Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14216 @srowen yeah I have pushed, "some minor update" https://github.com/apache/spark/pull/14216/commits/362074187d8845eeb40452eceec10f7e8ad805df --- If your project is set up for it, you

[GitHub] spark pull request #14333: [SPARK-16696][ML][MLLib] unused broadcast variabl...

2016-07-24 Thread WeichenXu123

GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/14333 [SPARK-16696][ML][MLLib] unused broadcast variables do destroy call to release memory in time ## What changes were proposed in this pull request? update unused broadcast in KMeans

[GitHub] spark pull request #14335: [SPARK-16697][ML][MLLib] improve LDA submitMiniBa...

2016-07-24 Thread WeichenXu123

GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/14335 [SPARK-16697][ML][MLLib] improve LDA submitMiniBatch method to avoid redundant RDD computation ## What changes were proposed in this pull request? In `LDAOptimizer.submitMiniBatch

[GitHub] spark issue #14265: [PySpark] add picklable SparseMatrix in pyspark.ml.commo...

2016-07-24 Thread WeichenXu123

Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14265 @srowen We can check python/ml/tests.py, `VectorTests.test_serialize` function, it contains a test for `SparseMatrix` serializing/unserializing, so that we can confirm that this works

[GitHub] spark pull request #14440: [SPARK-16835][ML] add training data unpersist han...

2016-08-01 Thread WeichenXu123

GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/14440 [SPARK-16835][ML] add training data unpersist handling when throw exception [SPARK-16835][ML] add training data `unpersist` handling when throw exception ## What changes were

[GitHub] spark pull request #14335: [SPARK-16697][ML][MLLib] improve LDA submitMiniBa...

2016-07-25 Thread WeichenXu123

Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/14335#discussion_r72013619 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -472,12 +473,13 @@ final class OnlineLDAOptimizer extends

[GitHub] spark issue #14333: [SPARK-16696][ML][MLLib] destroy KMeans bcNewCenters whe...

2016-07-25 Thread WeichenXu123

Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14333 @srowen The KMeans.initKMeansParallel already implements the pattern "persist current step RDD, and unpersist previous one", but I think an RDD persisted can also break down becau

[GitHub] spark issue #14333: [SPARK-16696][ML][MLLib] destroy KMeans bcNewCenters whe...

2016-07-25 Thread WeichenXu123

Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14333 @srowen I check the code about KMean `bcNewCenters` again, if we want to make sure the recovery of RDD will successful in any unexcepted case, we have to keep all the `bcNewCenters

[GitHub] spark pull request #14335: [SPARK-16697][ML][MLLib] improve LDA submitMiniBa...

2016-07-25 Thread WeichenXu123

Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/14335#discussion_r72014278 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -472,12 +473,13 @@ final class OnlineLDAOptimizer extends

[GitHub] spark issue #14333: [SPARK-16696][ML][MLLib] destroy KMeans bcNewCenters whe...

2016-07-30 Thread WeichenXu123

Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14333 @srowen I check code again, the problem I mentioned above `But now I found another problem in BisectKMeans: in line 191 there is a iteration it also need this pattern âpersist

[GitHub] spark pull request #14604: [Doc] add config option spark.ui.enabled into doc...

2016-08-11 Thread WeichenXu123

GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/14604 [Doc] add config option spark.ui.enabled into document ## What changes were proposed in this pull request? The configuration doc lost the config option `spark.ui.enabled` (default

[GitHub] spark issue #14483: [SPARK-16880][ML][MLLib] make ann training data persiste...

2016-08-03 Thread WeichenXu123

Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14483 @srowen yeah, others algorithm using LBFGS all have this pattern, only ANN forgot it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #14156: [SPARK-16499][ML][MLLib] improve ApplyInPlace function i...

2016-08-03 Thread WeichenXu123

Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14156 yeah, currently it seems to make a little overhead (do a copy), but I think it will take advantage of breeze optimization, in the future, e.g, SIMD instructions or something ? --- If your

[GitHub] spark issue #14156: [SPARK-16499][ML][MLLib] improve ApplyInPlace function i...

2016-08-03 Thread WeichenXu123

Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14156 @srowen The := operator in BDM is simply copy one BDM to another, and it is widely used in breeze source, e.g, we can check DenseMatrix.copy function in Breeze: it first use

[GitHub] spark issue #14156: [SPARK-16499][ML][MLLib] improve ApplyInPlace function i...

2016-08-04 Thread WeichenXu123

Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14156 @srowen yeah, the function supplied here called cannot be turned into SIMD instructions but I think it can do some parallelization optimization on large matrix, for example we can split

[GitHub] spark pull request #14483: [SPARK-16880][ML][MLLib] make ann training data p...

2016-08-03 Thread WeichenXu123

GitHub user WeichenXu123 opened a pull request: https://github.com/apache/spark/pull/14483 [SPARK-16880][ML][MLLib] make ann training data persisted if needed ## What changes were proposed in this pull request? To Make sure ANN layer input training data to be persisted

[GitHub] spark issue #14629: [WIP][SPARK-17046][SQL] prevent user using dataframe.sel...

2016-08-14 Thread WeichenXu123

Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14629 MySql do not allow select with 0 columns, and I think select() is useless, no one will do such operation, so, is it better to generate compiling error when detecting code use `df.select

[GitHub] spark issue #14520: [SPARK-16934][ML][MLLib] Improve LogisticCostFun to avoi...

2016-08-14 Thread WeichenXu123

Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14520 @yanboliang Thanks for carefully review! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #14520: [SPARK-16934][ML][MLLib] Improve LogisticCostFun to avoi...

2016-08-14 Thread WeichenXu123

Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14520 @sethah I attach the test result and it looks good. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #14520: [SPARK-16934][ML][MLLib] Improve LogisticCostFun to avoi...

2016-08-13 Thread WeichenXu123

Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/14520 cc @yanboliang Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1170 matches

Mail list logo