[GitHub] spark pull request #18731: [SPARK-20990][SQL] Read all JSON documents in fil...

2017-07-31 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/18731#discussion_r130294946 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala --- @@ -347,13 +347,18 @@ class JacksonParser

[GitHub] spark issue #18622: [SPARK-21340] Bring pyspark BinaryClassificationMetrics ...

2017-08-15 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/18622 @srowen any comment on this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #18329: [SPARK-19909][SS] Disabling the usage of a temporary dir...

2017-08-15 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/18329 @zsxwing @tdas any comment on this? Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #18951: [SPARK-21738] Thriftserver doesn't cancel jobs wh...

2017-08-15 Thread mgaido91
GitHub user mgaido91 opened a pull request: https://github.com/apache/spark/pull/18951 [SPARK-21738] Thriftserver doesn't cancel jobs when session is closed ## What changes were proposed in this pull request? When a session is closed the Thriftserver doesn't cancel the jobs

[GitHub] spark pull request #18538: [SPARK-14516][ML] Adding ClusteringEvaluator with...

2017-08-16 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/18538#discussion_r133385546 --- Diff: mllib/src/test/scala/org/apache/spark/ml/evaluation/ClusteringEvaluatorSuite.scala --- @@ -0,0 +1,225 @@ +/* + * Licensed to the Apache

[GitHub] spark issue #18731: [SPARK-20990][SQL] Read all JSON documents in files when...

2017-08-15 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/18731 @gatorsmile any feedback on this? I added the support for all the corrupt record handling modes and I added the relative tests. Is anything else needed? --- If your project is set up

[GitHub] spark pull request #18538: [SPARK-14516][ML] Adding ClusteringEvaluator with...

2017-08-15 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/18538#discussion_r133182964 --- Diff: mllib/src/main/scala/org/apache/spark/ml/evaluation/ClusteringEvaluator.scala --- @@ -0,0 +1,240 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #18538: [SPARK-14516][ML] Adding ClusteringEvaluator with...

2017-08-15 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/18538#discussion_r133185455 --- Diff: mllib/src/main/scala/org/apache/spark/ml/evaluation/ClusteringEvaluator.scala --- @@ -0,0 +1,240 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #18538: [SPARK-14516][ML] Adding ClusteringEvaluator with...

2017-08-15 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/18538#discussion_r133185265 --- Diff: mllib/src/test/scala/org/apache/spark/ml/evaluation/ClusteringEvaluatorSuite.scala --- @@ -0,0 +1,225 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #18622: [SPARK 21340] Bring pyspark BinaryClassificationM...

2017-07-13 Thread mgaido91
GitHub user mgaido91 opened a pull request: https://github.com/apache/spark/pull/18622 [SPARK 21340] Bring pyspark BinaryClassificationMetrics to parity with the Scala API ## What changes were proposed in this pull request? Adding all the missing methods in the pyspark API

[GitHub] spark pull request #18731: [SPARK-20990][SQL] Read all JSON documents in fil...

2017-07-25 Thread mgaido91
GitHub user mgaido91 opened a pull request: https://github.com/apache/spark/pull/18731 [SPARK-20990][SQL] Read all JSON documents in files when multiline mode is on ## What changes were proposed in this pull request? The PR improves the JSON parsing so that now all

[GitHub] spark issue #18731: [SPARK-20990][SQL] Read all JSON documents in files when...

2017-07-25 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/18731 cc @gatorsmile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #18731: [SPARK-20990][SQL] Read all JSON documents in files when...

2017-07-26 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/18731 I am debugging, thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #18731: [SPARK-20990][SQL] Read all JSON documents in files when...

2017-07-26 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/18731 The reason of the UT failure is that in these two UTs we are passing invalid JSONs (mind the extra closed curly brace): - https://github.com/apache/spark/blob/master/sql/core/src/test

[GitHub] spark pull request #18538: [SPARK-14516][ML] Adding ClusteringEvaluator with...

2017-07-05 Thread mgaido91
GitHub user mgaido91 opened a pull request: https://github.com/apache/spark/pull/18538 [SPARK-14516][ML] Adding ClusteringEvaluator with the implementation of Cosine silhouette and squared Euclidean silhouette. ## What changes were proposed in this pull request

[GitHub] spark issue #18538: [SPARK-14516][ML] Adding ClusteringEvaluator with the im...

2017-08-09 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/18538 @yanboliang thanks for your review. I refactored the code according to your suggestions and I removed the cosine implementation. Might you please review it now? Thanks. --- If your

[GitHub] spark pull request #18538: [SPARK-14516][ML] Adding ClusteringEvaluator with...

2017-08-03 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/18538#discussion_r131121892 --- Diff: mllib/src/test/scala/org/apache/spark/ml/evaluation/ClusteringEvaluatorSuite.scala --- @@ -0,0 +1,235 @@ +/* + * Licensed to the Apache

[GitHub] spark issue #18951: [SPARK-21738] Thriftserver doesn't cancel jobs when sess...

2017-08-16 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/18951 @dongjoon-hyun I think there is no problem, should I submit a new PR on branch-2.2 too then? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request #18622: [SPARK-21340] Bring pyspark BinaryClassificationM...

2017-08-19 Thread mgaido91
Github user mgaido91 closed the pull request at: https://github.com/apache/spark/pull/18622 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #18622: [SPARK-21340] Bring pyspark BinaryClassificationMetrics ...

2017-08-19 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/18622 Thanks! Might you please resolve the JIRA as "Won't Fix" then? Yes, of course I am interested, thanks @holdenk, if I can help and contribute I am very happy! --- If your project

[GitHub] spark issue #18622: [SPARK-21340] Bring pyspark BinaryClassificationMetrics ...

2017-08-19 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/18622 Thank @holdenk, should I close this PR then? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #18985: [SPARK-21772] Fix staging parent directory for InsertInt...

2017-08-20 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/18985 I am unable to reproduce the issue too. Moreover, the PR would change also the location for normal SparkSQL application. This sounds strange to me because then this problem should affect

[GitHub] spark pull request #19024: [SPARK-21469][ML][EXAMPLES] Adding Examples for F...

2017-08-22 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19024#discussion_r134596865 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/FeatureHasherExample.scala --- @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19024: [SPARK-21469][ML][EXAMPLES] Adding Examples for F...

2017-08-23 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19024#discussion_r134726421 --- Diff: examples/src/main/scala/org/apache/spark/examples/ml/FeatureHasherExample.scala --- @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #18538: [SPARK-14516][ML] Adding ClusteringEvaluator with...

2017-08-17 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/18538#discussion_r133745930 --- Diff: mllib/src/main/scala/org/apache/spark/ml/evaluation/ClusteringEvaluator.scala --- @@ -0,0 +1,240 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #18538: [SPARK-14516][ML] Adding ClusteringEvaluator with...

2017-08-17 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/18538#discussion_r133744052 --- Diff: mllib/src/main/scala/org/apache/spark/ml/evaluation/ClusteringEvaluator.scala --- @@ -0,0 +1,240 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #18538: [SPARK-14516][ML] Adding ClusteringEvaluator with...

2017-08-17 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/18538#discussion_r133747305 --- Diff: mllib/src/main/scala/org/apache/spark/ml/evaluation/ClusteringEvaluator.scala --- @@ -0,0 +1,240 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #18538: [SPARK-14516][ML] Adding ClusteringEvaluator with...

2017-08-18 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/18538#discussion_r133958240 --- Diff: mllib/src/main/scala/org/apache/spark/ml/evaluation/ClusteringEvaluator.scala --- @@ -0,0 +1,240 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #18329: [SPARK-19909][SS] Disabling the usage of a tempor...

2017-06-22 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/18329#discussion_r123440074 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala --- @@ -264,12 +281,12 @@ final class DataStreamWriter[T] private

[GitHub] spark pull request #18329: [SPARK-19909][SS] Disabling the usage of a tempor...

2017-06-22 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/18329#discussion_r123445514 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala --- @@ -264,12 +281,12 @@ final class DataStreamWriter[T] private

[GitHub] spark pull request #18329: [SPARK-19909][SS] Disabling the usage of a tempor...

2017-06-22 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/18329#discussion_r123440321 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala --- @@ -17,8 +17,10 @@ package

[GitHub] spark pull request #18329: [SPARK-19909][SS] Disabling the usage of a tempor...

2017-06-22 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/18329#discussion_r123451639 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala --- @@ -235,6 +239,21 @@ final class DataStreamWriter[T] private

[GitHub] spark pull request #18329: [SPARK-19909][SS] Disabling the usage of a tempor...

2017-06-22 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/18329#discussion_r123455655 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala --- @@ -264,12 +281,12 @@ final class DataStreamWriter[T] private

[GitHub] spark pull request #18329: [SPARK-19909][SS] Disabling the usage of a tempor...

2017-06-22 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/18329#discussion_r123458885 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala --- @@ -235,6 +239,21 @@ final class DataStreamWriter[T] private

[GitHub] spark pull request #18329: [SPARK-19909][SS] Disabling the usage of a tempor...

2017-06-23 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/18329#discussion_r123702984 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala --- @@ -235,6 +239,21 @@ final class DataStreamWriter[T] private

[GitHub] spark issue #17248: [SPARK-19909][SS] Batches will fail in case that tempora...

2017-06-14 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/17248 I don't think this PR resolve properly the issue. Indeed, it somewhat forces the metadata to be written in a local dir instead of the configured default filesystem. Of course, this fixes

[GitHub] spark pull request #18329: [SPARK-19909][SS] Disabling the usage of a tempor...

2017-06-16 Thread mgaido91
GitHub user mgaido91 opened a pull request: https://github.com/apache/spark/pull/18329 [SPARK-19909][SS] Disabling the usage of a temporary directory for the checkpoint location if the temporary directory is on a filesystem different from the default one. ## What changes were

[GitHub] spark issue #19204: [SPARK-21981][PYTHON][ML] Added Python interface for Clu...

2017-09-13 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/19204 Thanks @WeichenXu123, I added it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #19204: [SPARK-21981][PYTHON][ML] Added Python interface for Clu...

2017-09-14 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/19204 Thank you for your review and help @WeichenXu123! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #19224: [SPARK-20990] Support multiple multiline JSON in ...

2017-09-13 Thread mgaido91
GitHub user mgaido91 opened a pull request: https://github.com/apache/spark/pull/19224 [SPARK-20990] Support multiple multiline JSON in the same file ## What changes were proposed in this pull request? The PR improves the JSON parsing so that now all the JSON documents

[GitHub] spark pull request #19261: [SPARK-22040] Add current_date function with time...

2017-09-17 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19261#discussion_r139309404 --- Diff: python/pyspark/sql/functions.py --- @@ -793,12 +793,12 @@ def ntile(n): # -- Date/Timestamp functions

[GitHub] spark pull request #19261: [SPARK-22040] Add current_date function with time...

2017-09-17 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19261#discussion_r139309394 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -2508,6 +2508,14 @@ object functions { def current_date(): Column

[GitHub] spark pull request #19261: [SPARK-22040] Add current_date function with time...

2017-09-17 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19261#discussion_r139309376 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -2508,6 +2508,14 @@ object functions { def current_date(): Column

[GitHub] spark pull request #19261: [SPARK-22040] Add current_date function with time...

2017-09-17 Thread mgaido91
GitHub user mgaido91 opened a pull request: https://github.com/apache/spark/pull/19261 [SPARK-22040] Add current_date function with timezone id ## What changes were proposed in this pull request? Add current_date function with timezone id. ## How was this patch

[GitHub] spark issue #19447: [SPARK-22215][SQL] Add configuration to set the threshol...

2017-10-06 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/19447 Here the answers to your questions @gatorsmile , please tell me if I need to elaborate more deeply. This conf controls how many inner classes are generated. A big value means that we will have

[GitHub] spark pull request #19447: [SPARK-22215][SQL] Add configuration to set the t...

2017-10-06 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19447#discussion_r143204826 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -279,11 +279,13 @@ class

[GitHub] spark pull request #19447: [SPARK-22215][SQL] Add configuration to set the t...

2017-10-06 Thread mgaido91
GitHub user mgaido91 opened a pull request: https://github.com/apache/spark/pull/19447 [SPARK-22215][SQL] Add configuration to set the threshold for generated class ## What changes were proposed in this pull request? SPARK-18016 introduced an arbitrary threshold

[GitHub] spark pull request #19447: [SPARK-22215][SQL] Add configuration to set the t...

2017-10-06 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19447#discussion_r143228854 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -934,6 +934,15 @@ object SQLConf { .intConf

[GitHub] spark issue #19447: [SPARK-22215][SQL] Add configuration to set the threshol...

2017-10-06 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/19447 Yes, with small values it will produce a lot of small `NestedClass`es, but it will work. Instead, if the value is too high this, all the functions (methods) which are created are inlined

[GitHub] spark pull request #19447: [SPARK-22215][SQL] Add configuration to set the t...

2017-10-06 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19447#discussion_r143220404 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -934,6 +934,15 @@ object SQLConf { .intConf

[GitHub] spark pull request #19447: [SPARK-22215][SQL] Add configuration to set the t...

2017-10-06 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19447#discussion_r143224951 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -934,6 +934,15 @@ object SQLConf { .intConf

[GitHub] spark issue #19447: [SPARK-22215][SQL] Add configuration to set the threshol...

2017-10-06 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/19447 thank you for your comments @kiszk and @dongjoon-hyun, I changed a bit the approach according to a similar approach in the same file: https://github.com/mgaido91/spark/blob

[GitHub] spark issue #19447: [SPARK-22215][SQL] Add configuration to set the threshol...

2017-10-06 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/19447 @dongjoon-hyun I am not sure how I can test it. The use case in which this was useful is quite complex and I have not been able to reproduce it in a simpler way

[GitHub] spark pull request #19337: [SPARK-22114][ML][MLLIB]add epsilon for LDA

2017-10-10 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19337#discussion_r143705705 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/LDASuite.scala --- @@ -119,6 +121,8 @@ class LDASuite extends SparkFunSuite

[GitHub] spark pull request #19337: [SPARK-22114][ML][MLLIB]add epsilon for LDA

2017-10-10 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19337#discussion_r143704472 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/LDASuite.scala --- @@ -119,6 +121,8 @@ class LDASuite extends SparkFunSuite

[GitHub] spark pull request #19337: [SPARK-22114][ML][MLLIB]add epsilon for LDA

2017-10-10 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19337#discussion_r143720272 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -573,7 +584,8 @@ private[clustering] object OnlineLDAOptimizer

[GitHub] spark pull request #19447: [SPARK-22215][SQL] Add configuration to set the t...

2017-10-09 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19447#discussion_r143540779 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -279,11 +279,11 @@ class

[GitHub] spark pull request #19494: [SPARK-22249][SQL] isin with empty list throws ex...

2017-10-15 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19494#discussion_r144713232 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala --- @@ -104,7 +104,8 @@ case class

[GitHub] spark issue #19494: [SPARK-22249][SQL] isin with empty list throws exception...

2017-10-15 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/19494 @srowen I also updated the UT to check all the possible cases. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #19480: [SPARK-22226][SQL] splitExpression can create too...

2017-10-13 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19480#discussion_r144541081 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGenerationSuite.scala --- @@ -201,6 +201,23 @@ class

[GitHub] spark pull request #19492: [SPARK-22228][SQL] Add support for array

2017-10-16 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19492#discussion_r144783459 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala --- @@ -170,6 +160,31 @@ class JsonFunctionsSuite extends QueryTest

[GitHub] spark pull request #19492: [SPARK-22228][SQL] Add support for array

2017-10-16 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19492#discussion_r144783193 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala --- @@ -343,6 +367,25 @@ class JacksonParser

[GitHub] spark pull request #19492: [SPARK-22228][SQL] Add support for array

2017-10-16 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19492#discussion_r144789703 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala --- @@ -343,6 +367,25 @@ class JacksonParser

[GitHub] spark pull request #19492: [SPARK-22228][SQL] Add support for array

2017-10-16 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19492#discussion_r144783499 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/JsonFunctionsSuite.scala --- @@ -170,6 +160,31 @@ class JsonFunctionsSuite extends QueryTest

[GitHub] spark pull request #19492: [SPARK-22228][SQL] Add support for array

2017-10-16 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19492#discussion_r144792285 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -536,26 +536,31 @@ case class JsonToStructs

[GitHub] spark issue #19340: [SPARK-22119][ML] Add cosine distance to KMeans

2017-10-16 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/19340 @mengxr @yu-iskw, sorry for pinging you, I saw from the commits you contributed to KMeans, might you please help reviewing this PR? Thanks

[GitHub] spark pull request #19492: [SPARK-22228][SQL] Add support for array

2017-10-16 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19492#discussion_r144782180 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala --- @@ -89,6 +95,24 @@ class JacksonParser

[GitHub] spark pull request #19492: [SPARK-22228][SQL] Add support for array

2017-10-16 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19492#discussion_r144784037 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonExpressions.scala --- @@ -536,26 +536,31 @@ case class JsonToStructs

[GitHub] spark pull request #19492: [SPARK-22228][SQL] Add support for array

2017-10-16 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19492#discussion_r144783859 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala --- @@ -35,19 +35,25 @@ import org.apache.spark.util.Utils

[GitHub] spark issue #18329: [SPARK-19909][SS] Disabling the usage of a temporary dir...

2017-10-16 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/18329 kindly ping @zsxwing and @tdas --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #19492: [SPARK-22228][SQL] Add support for array

2017-10-16 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19492#discussion_r144836485 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala --- @@ -35,19 +35,25 @@ import org.apache.spark.util.Utils

[GitHub] spark pull request #19492: [SPARK-22228][SQL] Add support for array

2017-10-16 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19492#discussion_r144837336 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JacksonParser.scala --- @@ -343,6 +367,25 @@ class JacksonParser

[GitHub] spark issue #19340: [SPARK-22119][ML] Add cosine distance to KMeans

2017-10-06 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/19340 kindly remind to @srowen and @yanboliang if you can take a look at it when you have time, thanks. --- - To unsubscribe, e-mail

[GitHub] spark issue #19340: [SPARK-22119][ML] Add cosine distance to KMeans

2017-10-06 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/19340 thanks for your replt @srowen. I saw it. My feeling is that so far there is no distance metric definition on `Vectors`. If we add the cosine distance, than we should add the Euclidean too

[GitHub] spark issue #19480: [SPARK-22226][SQL] splitExpression can create too many m...

2017-10-17 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/19480 @bdrillard thanks but we removed the "guilty" test case for many reasons. Thank you anyway. --- - To unsubscri

[GitHub] spark pull request #19494: [SPARK-22249][SQL] isin with empty list throws ex...

2017-10-13 Thread mgaido91
GitHub user mgaido91 opened a pull request: https://github.com/apache/spark/pull/19494 [SPARK-22249][SQL] isin with empty list throws exception on cached DataFrame ## What changes were proposed in this pull request? As pointed out in the JIRA, there is a bug which causes

[GitHub] spark issue #19494: [SPARK-22249][SQL] isin with empty list throws exception...

2017-10-13 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/19494 @srowen do you mean replacing `contains` with `exists`? If so, might you please explain me why `exists` is a better option? Thanks

[GitHub] spark pull request #19480: [SPARK-22226][SQL] splitExpression can create too...

2017-10-13 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19480#discussion_r144562615 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGenerationSuite.scala --- @@ -201,6 +201,23 @@ class

[GitHub] spark issue #19480: [SPARK-22226][SQL] splitExpression can create too many m...

2017-10-13 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/19480 sorry, the test failure is due to an OutOfMemory exception. I don't know whether it is possible to change the heap size used by sbt in the Jenkins job and I am not sure this is the right thing

[GitHub] spark pull request #19480: [SPARK-22226][SQL] splitExpression can create too...

2017-10-13 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19480#discussion_r144588780 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala --- @@ -2103,4 +2103,35 @@ class DataFrameSuite extends QueryTest

[GitHub] spark pull request #19494: [SPARK-22249][SQL] isin with empty list throws ex...

2017-10-14 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19494#discussion_r144690815 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala --- @@ -104,7 +104,8 @@ case class

[GitHub] spark pull request #19480: [SPARK-22226][SQL] splitExpression can create too...

2017-10-14 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19480#discussion_r144688322 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala --- @@ -2103,4 +2103,35 @@ class DataFrameSuite extends QueryTest

[GitHub] spark pull request #19480: [SPARK-22226][SQL] splitExpression can create too...

2017-10-14 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19480#discussion_r144688074 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -277,13 +292,25 @@ class

[GitHub] spark pull request #19480: [SPARK-22226] splitExpression can create too many...

2017-10-12 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19480#discussion_r144242361 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -798,10 +830,35 @@ class

[GitHub] spark pull request #19480: [SPARK-22226] splitExpression can create too many...

2017-10-12 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19480#discussion_r144243121 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -277,13 +291,25 @@ class

[GitHub] spark pull request #19480: [SPARK-22226] splitExpression can create too many...

2017-10-12 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19480#discussion_r144242883 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -798,10 +830,35 @@ class

[GitHub] spark pull request #19480: [SPARK-22226] splitExpression can create too many...

2017-10-12 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19480#discussion_r144242223 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -78,6 +78,20 @@ case class

[GitHub] spark pull request #19480: [SPARK-22226] splitExpression can create too many...

2017-10-12 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19480#discussion_r144242072 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -277,13 +291,25 @@ class

[GitHub] spark pull request #19480: [SPARK-22226] splitExpression can create too many...

2017-10-12 Thread mgaido91
GitHub user mgaido91 opened a pull request: https://github.com/apache/spark/pull/19480 [SPARK-6] splitExpression can create too many method calls in the outer class ## What changes were proposed in this pull request? SPARK-18016 introduced {{NestedClass}} to avoid

[GitHub] spark pull request #19480: [SPARK-22226] splitExpression can create too many...

2017-10-12 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19480#discussion_r144245263 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -798,10 +830,35 @@ class

[GitHub] spark pull request #19480: [SPARK-22226] splitExpression can create too many...

2017-10-12 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19480#discussion_r144246243 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -798,10 +830,35 @@ class

[GitHub] spark pull request #19480: [SPARK-22226] splitExpression can create too many...

2017-10-12 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19480#discussion_r144252483 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -798,10 +830,35 @@ class

[GitHub] spark pull request #19480: [SPARK-22226] splitExpression can create too many...

2017-10-12 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19480#discussion_r144263743 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -798,10 +830,35 @@ class

[GitHub] spark pull request #19480: [SPARK-22226] splitExpression can create too many...

2017-10-12 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19480#discussion_r144253074 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -798,10 +830,35 @@ class

[GitHub] spark pull request #19480: [SPARK-22226] splitExpression can create too many...

2017-10-12 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19480#discussion_r144283907 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -78,6 +78,20 @@ case class

[GitHub] spark pull request #19480: [SPARK-22226] splitExpression can create too many...

2017-10-12 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19480#discussion_r144284488 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -798,10 +830,35 @@ class

[GitHub] spark pull request #19480: [SPARK-22226] splitExpression can create too many...

2017-10-12 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19480#discussion_r144288530 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -798,10 +830,35 @@ class

[GitHub] spark pull request #19480: [SPARK-22226] splitExpression can create too many...

2017-10-12 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19480#discussion_r144291974 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -798,10 +830,35 @@ class

[GitHub] spark pull request #19480: [SPARK-22226] splitExpression can create too many...

2017-10-12 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19480#discussion_r144294718 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -798,10 +830,35 @@ class

[GitHub] spark pull request #19480: [SPARK-22226] splitExpression can create too many...

2017-10-12 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19480#discussion_r144295623 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -798,10 +830,35 @@ class

  1   2   3   4   5   6   7   8   9   10   >