date:20190826

[GitHub] [spark] dilipbiswal commented on issue #24759: [SPARK-27395][SQL] Improve EXPLAIN command

2019-08-26 Thread GitBox

dilipbiswal commented on issue #24759: [SPARK-27395][SQL] Improve EXPLAIN 
command
URL: https://github.com/apache/spark/pull/24759#issuecomment-524910740
 
 
   Thanks a LOT @cloud-fan @gatorsmile @maryannxue @maropu @ekoifman 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25583: [MINOR][SS] Reuse KafkaSourceInitialOffsetWriter to deduplicate

2019-08-26 Thread GitBox

AmplabJenkins removed a comment on issue #25583: [MINOR][SS] Reuse 
KafkaSourceInitialOffsetWriter to deduplicate
URL: https://github.com/apache/spark/pull/25583#issuecomment-524910226
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #25583: [MINOR][SS] Reuse KafkaSourceInitialOffsetWriter to deduplicate

2019-08-26 Thread GitBox

SparkQA removed a comment on issue #25583: [MINOR][SS] Reuse 
KafkaSourceInitialOffsetWriter to deduplicate
URL: https://github.com/apache/spark/pull/25583#issuecomment-524899246
 
 
   **[Test build #109741 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109741/testReport)**
 for PR 25583 at commit 
[`98c9d69`](https://github.com/apache/spark/commit/98c9d697b18d04cfa922636a05c751e4456b33aa).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25583: [MINOR][SS] Reuse KafkaSourceInitialOffsetWriter to deduplicate

2019-08-26 Thread GitBox

AmplabJenkins removed a comment on issue #25583: [MINOR][SS] Reuse 
KafkaSourceInitialOffsetWriter to deduplicate
URL: https://github.com/apache/spark/pull/25583#issuecomment-524910231
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/109741/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25583: [MINOR][SS] Reuse KafkaSourceInitialOffsetWriter to deduplicate

2019-08-26 Thread GitBox

AmplabJenkins commented on issue #25583: [MINOR][SS] Reuse 
KafkaSourceInitialOffsetWriter to deduplicate
URL: https://github.com/apache/spark/pull/25583#issuecomment-524910231
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/109741/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25583: [MINOR][SS] Reuse KafkaSourceInitialOffsetWriter to deduplicate

2019-08-26 Thread GitBox

AmplabJenkins commented on issue #25583: [MINOR][SS] Reuse 
KafkaSourceInitialOffsetWriter to deduplicate
URL: https://github.com/apache/spark/pull/25583#issuecomment-524910226
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25583: [MINOR][SS] Reuse KafkaSourceInitialOffsetWriter to deduplicate

2019-08-26 Thread GitBox

SparkQA commented on issue #25583: [MINOR][SS] Reuse 
KafkaSourceInitialOffsetWriter to deduplicate
URL: https://github.com/apache/spark/pull/25583#issuecomment-524910083
 
 
   **[Test build #109741 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109741/testReport)**
 for PR 25583 at commit 
[`98c9d69`](https://github.com/apache/spark/commit/98c9d697b18d04cfa922636a05c751e4456b33aa).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srowen commented on a change in pull request #25423: [SPARK-28701][test-hadoop3.2][test-java11][k8s] adding java11 support for pull request builds

2019-08-26 Thread GitBox

srowen commented on a change in pull request #25423: 
[SPARK-28701][test-hadoop3.2][test-java11][k8s] adding java11 support for pull 
request builds
URL: https://github.com/apache/spark/pull/25423#discussion_r317655226
 
 

 ##
 File path: dev/run-tests.py
 ##
 @@ -404,6 +404,12 @@ def run_scala_tests(build_tool, hadoop_version, 
test_modules, excluded_tags):
 if excluded_tags:
 test_profiles += ['-Dtest.exclude.tags=' + ",".join(excluded_tags)]
 
+# set up java11 env if this is a pull request build with 'test-java11' in 
the title
+if "test-java11" in os.environ["ghprbPullTitle"].lower():
+os.environ["JAVA_HOME"] = "/usr/java/jdk-11.0.1"
+os.environ["PATH"] = "%s/bin:%s" % (os.environ["JAVA_HOME"], 
os.environ["PATH"])
+test_profiles += ['-Djava.version=11']
 
 Review comment:
   Hm, and why does 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-3.2-jdk-11/
 pass then? it is doing the same thing in the Jenkins config. (OK I think I 
answered my own question below)
   
   EDIT: Oh, because it doesn't run Pyspark tests?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srowen commented on a change in pull request #25423: [SPARK-28701][test-hadoop3.2][test-java11][k8s] adding java11 support for pull request builds

2019-08-26 Thread GitBox

srowen commented on a change in pull request #25423: 
[SPARK-28701][test-hadoop3.2][test-java11][k8s] adding java11 support for pull 
request builds
URL: https://github.com/apache/spark/pull/25423#discussion_r317655226
 
 

 ##
 File path: dev/run-tests.py
 ##
 @@ -404,6 +404,12 @@ def run_scala_tests(build_tool, hadoop_version, 
test_modules, excluded_tags):
 if excluded_tags:
 test_profiles += ['-Dtest.exclude.tags=' + ",".join(excluded_tags)]
 
+# set up java11 env if this is a pull request build with 'test-java11' in 
the title
+if "test-java11" in os.environ["ghprbPullTitle"].lower():
+os.environ["JAVA_HOME"] = "/usr/java/jdk-11.0.1"
+os.environ["PATH"] = "%s/bin:%s" % (os.environ["JAVA_HOME"], 
os.environ["PATH"])
+test_profiles += ['-Djava.version=11']
 
 Review comment:
   Hm, and why does 
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-3.2-jdk-11/
 pass then? it is doing the same thing in the Jenkins config. (OK I think I 
answered my own question below)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srowen commented on a change in pull request #25423: [SPARK-28701][test-hadoop3.2][test-java11][k8s] adding java11 support for pull request builds

2019-08-26 Thread GitBox

srowen commented on a change in pull request #25423: 
[SPARK-28701][test-hadoop3.2][test-java11][k8s] adding java11 support for pull 
request builds
URL: https://github.com/apache/spark/pull/25423#discussion_r317650935
 
 

 ##
 File path: dev/run-tests.py
 ##
 @@ -404,6 +404,12 @@ def run_scala_tests(build_tool, hadoop_version, 
test_modules, excluded_tags):
 if excluded_tags:
 test_profiles += ['-Dtest.exclude.tags=' + ",".join(excluded_tags)]
 
+# set up java11 env if this is a pull request build with 'test-java11' in 
the title
+if "test-java11" in os.environ["ghprbPullTitle"].lower():
+os.environ["JAVA_HOME"] = "/usr/java/jdk-11.0.1"
+os.environ["PATH"] = "%s/bin:%s" % (os.environ["JAVA_HOME"], 
os.environ["PATH"])
+test_profiles += ['-Djava.version=11']
 
 Review comment:
   It should use Java 11 if the path provides Java 11 and the test harness that 
runs Python tests does too. At least I don't know how else one would tell 
pyspark what to use!
   
   In fact I'm pretty sure the test failure here shows that it is using JDK 11. 
From JPMML: `java.lang.ClassNotFoundException: 
com.sun.xml.internal.bind.v2.ContextFactory` This would be caused by JDK 11 
changes. However, I don't get why all the other non-Python tests don't fail.
   
   Given the weird problem in https://github.com/apache/spark/pull/24651 I am 
wondering if we have some subtle classpath issues with how the Pyspark tests 
are run.
   
   This one however might be more directly solvable by figuring out what is 
suggesting to use this old Sun JAXB implementation. I'll start digging around 
META-INF


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srowen commented on issue #25423: [SPARK-28701][test-hadoop3.2][test-java11][k8s] adding java11 support for pull request builds

2019-08-26 Thread GitBox

srowen commented on issue #25423: 
[SPARK-28701][test-hadoop3.2][test-java11][k8s] adding java11 support for pull 
request builds
URL: https://github.com/apache/spark/pull/25423#issuecomment-524903212
 
 
   I personally think this is OK to merge simply because we need a way to test 
JDK 11, and this seems to do that. The rest of the error is orthogonal.
   
   So, in order to use this in a JDK 11 Jenkins build, how would one configure 
the Jenkins job? it is only triggering off the PR title (which is also useful). 
OK if that's a future step.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25570: [SPARK-23519][SQL] create view should work from query with duplicate output columns

2019-08-26 Thread GitBox

cloud-fan commented on a change in pull request #25570: [SPARK-23519][SQL] 
create view should work from query with duplicate output columns
URL: https://github.com/apache/spark/pull/25570#discussion_r317653404
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/views.scala
 ##
 @@ -360,13 +362,15 @@ object ViewHelper {
   def generateViewProperties(
   properties: Map[String, String],
   session: SparkSession,
-  analyzedPlan: LogicalPlan): Map[String, String] = {
+  analyzedPlan: LogicalPlan,
+  fieldNames: Array[String]): Map[String, String] = {
+// for createViewCommand queryOutput may be different from fieldNames
 val queryOutput = analyzedPlan.schema.fieldNames
 
 // Generate the query column names, throw an AnalysisException if there 
exists duplicate column
 // names.
 SchemaUtils.checkColumnNameDuplication(
 
 Review comment:
   we can make the diff much smaller by changing this line to
   ```
   val colNamesToCheck = if (userSpecifiedColumns.nonEmpty) 
userSpecifiedColumns.map(_._1) else queryOutput
   SchemaUtils.checkColumnNameDuplication(colNamesToCheck, ...)
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25104: [SPARK-28341][SQL] create a public API for V2SessionCatalog

2019-08-26 Thread GitBox

AmplabJenkins removed a comment on issue #25104: [SPARK-28341][SQL] create a 
public API for V2SessionCatalog
URL: https://github.com/apache/spark/pull/25104#issuecomment-524901494
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14788/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25104: [SPARK-28341][SQL] create a public API for V2SessionCatalog

2019-08-26 Thread GitBox

AmplabJenkins removed a comment on issue #25104: [SPARK-28341][SQL] create a 
public API for V2SessionCatalog
URL: https://github.com/apache/spark/pull/25104#issuecomment-524901483
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25104: [SPARK-28341][SQL] create a public API for V2SessionCatalog

2019-08-26 Thread GitBox

AmplabJenkins commented on issue #25104: [SPARK-28341][SQL] create a public API 
for V2SessionCatalog
URL: https://github.com/apache/spark/pull/25104#issuecomment-524901483
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25104: [SPARK-28341][SQL] create a public API for V2SessionCatalog

2019-08-26 Thread GitBox

AmplabJenkins commented on issue #25104: [SPARK-28341][SQL] create a public API 
for V2SessionCatalog
URL: https://github.com/apache/spark/pull/25104#issuecomment-524901494
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14788/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] tgravescs commented on a change in pull request #25409: [SPARK-28414][WEBUI] UI updates to show resource info in Standalone

2019-08-26 Thread GitBox

tgravescs commented on a change in pull request #25409: [SPARK-28414][WEBUI] UI 
updates to show resource info in Standalone
URL: https://github.com/apache/spark/pull/25409#discussion_r317651126
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/deploy/StandaloneResourceUtils.scala
 ##
 @@ -345,4 +376,47 @@ private[spark] object StandaloneResourceUtils extends 
Logging {
   def needCoordinate(conf: SparkConf): Boolean = {
 conf.get(SPARK_RESOURCES_COORDINATE)
   }
+
+  def toMutable(immutableResources: Map[String, ResourceInformation])
+: Map[String, MutableResourceInfo] = {
+immutableResources.map { case (rName, rInfo) =>
+  val mutableAddress = new mutable.HashSet[String]()
+  rInfo.addresses.foreach(mutableAddress.add)
 
 Review comment:
   this can just be:   mutableAddress ++= rInfo.addresses
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25583: [MINOR][SS] Reuse KafkaSourceInitialOffsetWriter to deduplicate

2019-08-26 Thread GitBox

SparkQA commented on issue #25583: [MINOR][SS] Reuse 
KafkaSourceInitialOffsetWriter to deduplicate
URL: https://github.com/apache/spark/pull/25583#issuecomment-524899246
 
 
   **[Test build #109741 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109741/testReport)**
 for PR 25583 at commit 
[`98c9d69`](https://github.com/apache/spark/commit/98c9d697b18d04cfa922636a05c751e4456b33aa).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25104: [SPARK-28341][SQL] create a public API for V2SessionCatalog

2019-08-26 Thread GitBox

SparkQA commented on issue #25104: [SPARK-28341][SQL] create a public API for 
V2SessionCatalog
URL: https://github.com/apache/spark/pull/25104#issuecomment-524899251
 
 
   **[Test build #109742 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109742/testReport)**
 for PR 25104 at commit 
[`6847688`](https://github.com/apache/spark/commit/6847688f85bac472b6777de77bc7f8a9654fd6f6).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #20965: [SPARK-21870][SQL] Split aggregation code into small functions

2019-08-26 Thread GitBox

AmplabJenkins removed a comment on issue #20965: [SPARK-21870][SQL] Split 
aggregation code into small functions
URL: https://github.com/apache/spark/pull/20965#issuecomment-524898399
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/109733/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #20965: [SPARK-21870][SQL] Split aggregation code into small functions

2019-08-26 Thread GitBox

AmplabJenkins removed a comment on issue #20965: [SPARK-21870][SQL] Split 
aggregation code into small functions
URL: https://github.com/apache/spark/pull/20965#issuecomment-524898391
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #20965: [SPARK-21870][SQL] Split aggregation code into small functions

2019-08-26 Thread GitBox

AmplabJenkins commented on issue #20965: [SPARK-21870][SQL] Split aggregation 
code into small functions
URL: https://github.com/apache/spark/pull/20965#issuecomment-524898399
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/109733/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #20965: [SPARK-21870][SQL] Split aggregation code into small functions

2019-08-26 Thread GitBox

AmplabJenkins commented on issue #20965: [SPARK-21870][SQL] Split aggregation 
code into small functions
URL: https://github.com/apache/spark/pull/20965#issuecomment-524898391
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #20965: [SPARK-21870][SQL] Split aggregation code into small functions

2019-08-26 Thread GitBox

SparkQA removed a comment on issue #20965: [SPARK-21870][SQL] Split aggregation 
code into small functions
URL: https://github.com/apache/spark/pull/20965#issuecomment-524840399
 
 
   **[Test build #109733 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109733/testReport)**
 for PR 20965 at commit 
[`fee5edb`](https://github.com/apache/spark/commit/fee5edb6ba82149329e3452c1f4d8580ff251996).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #20965: [SPARK-21870][SQL] Split aggregation code into small functions

2019-08-26 Thread GitBox

SparkQA commented on issue #20965: [SPARK-21870][SQL] Split aggregation code 
into small functions
URL: https://github.com/apache/spark/pull/20965#issuecomment-524898067
 
 
   **[Test build #109733 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109733/testReport)**
 for PR 20965 at commit 
[`fee5edb`](https://github.com/apache/spark/commit/fee5edb6ba82149329e3452c1f4d8580ff251996).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Ngone51 commented on a change in pull request #25409: [SPARK-28414][WEBUI] UI updates to show resource info in Standalone

2019-08-26 Thread GitBox

Ngone51 commented on a change in pull request #25409: [SPARK-28414][WEBUI] UI 
updates to show resource info in Standalone
URL: https://github.com/apache/spark/pull/25409#discussion_r317646956
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/deploy/master/WorkerInfo.scala
 ##
 @@ -64,12 +61,54 @@ private[spark] class WorkerInfo(
 
   def coresFree: Int = cores - coresUsed
   def memoryFree: Int = memory - memoryUsed
-  def resourcesFree: Map[String, Int] = {
+  def resourcesAmountFree: Map[String, Int] = {
 resources.map { case (rName, rInfo) =>
   rName -> rInfo.availableAddrs.length
 }
   }
 
+  def resourcesInfo[T: ClassTag]: Map[String, T] = {
+resources.map { case (rName, rInfo) =>
+  rName -> createResourceInfo(rName, rInfo.addresses, 
implicitly[ClassTag[T]])
+}
+  }
+
+  def resourcesInfoFree: Map[String, ResourceInformation] = {
+resources.map { case (rName, rInfo) =>
+  rName -> createResourceInfo(rName, rInfo.availableAddrs,
+implicitly[ClassTag[ResourceInformation]])
+}
+  }
+
+  def resourcesInfoUsed[T: ClassTag]: Map[String, T] = {
+resources.map { case (rName, rInfo) =>
+  rName -> createResourceInfo(rName, rInfo.assignedAddrs, 
implicitly[ClassTag[T]])
+}
+  }
+
+  private def createResourceInfo[T](
+  name: String,
+  addresses: Seq[String],
+  ct: ClassTag[T]): T = {
+val clazz = ct.runtimeClass
+val rf = {
+  clazz match {
+case _ if clazz.equals(classOf[MutableResourceInfo]) =>
 
 Review comment:
   Thanks, and I chosen the alternative one, which seems more simply in 
WorkerInfo. While, the other one actually needs 4 functions(combinations of 
[immutable, mutable] * [total, used]), which seems noisy in WorkerInfo from my 
view.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25409: [SPARK-28414][WEBUI] UI updates to show resource info in Standalone

2019-08-26 Thread GitBox

SparkQA commented on issue #25409: [SPARK-28414][WEBUI] UI updates to show 
resource info in Standalone
URL: https://github.com/apache/spark/pull/25409#issuecomment-524896215
 
 
   **[Test build #109740 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109740/testReport)**
 for PR 25409 at commit 
[`1e64a33`](https://github.com/apache/spark/commit/1e64a33dd0a30a8b816025e528622c5d9c9f99eb).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25409: [SPARK-28414][WEBUI] UI updates to show resource info in Standalone

2019-08-26 Thread GitBox

AmplabJenkins removed a comment on issue #25409: [SPARK-28414][WEBUI] UI 
updates to show resource info in Standalone
URL: https://github.com/apache/spark/pull/25409#issuecomment-524895177
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25409: [SPARK-28414][WEBUI] UI updates to show resource info in Standalone

2019-08-26 Thread GitBox

AmplabJenkins removed a comment on issue #25409: [SPARK-28414][WEBUI] UI 
updates to show resource info in Standalone
URL: https://github.com/apache/spark/pull/25409#issuecomment-524895186
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14787/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a change in pull request #25583: [MINOR][SS] Reuse KafkaSourceInitialOffsetWriter to deduplicate

2019-08-26 Thread GitBox

HeartSaVioR commented on a change in pull request #25583: [MINOR][SS] Reuse 
KafkaSourceInitialOffsetWriter to deduplicate
URL: https://github.com/apache/spark/pull/25583#discussion_r317645581
 
 

 ##
 File path: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceInitialOffsetWriter.scala
 ##
 @@ -0,0 +1,63 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.kafka010
+
+import java.io._
+import java.nio.charset.StandardCharsets
+
+import org.apache.commons.io.IOUtils
+
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.execution.streaming.{HDFSMetadataLog, 
SerializedOffset}
+
+/** A version of [[HDFSMetadataLog]] specialized for saving the initial 
offsets. */
+private[kafka010] class KafkaSourceInitialOffsetWriter(
+sparkSession: SparkSession,
+metadataPath: String)
+  extends HDFSMetadataLog[KafkaSourceOffset](sparkSession, metadataPath) {
+
+  val VERSION = 1
+
+  override def serialize(metadata: KafkaSourceOffset, out: OutputStream): Unit 
= {
+out.write(0) // A zero byte is written to support Spark 2.1.0 (SPARK-19517)
+val writer = new BufferedWriter(new OutputStreamWriter(out, 
StandardCharsets.UTF_8))
+writer.write("v" + VERSION + "\n")
+writer.write(metadata.json)
+writer.flush
+  }
+
+  override def deserialize(in: InputStream): KafkaSourceOffset = {
+in.read() // A zero byte is read to support Spark 2.1.0 (SPARK-19517)
+val content = IOUtils.toString(new InputStreamReader(in, 
StandardCharsets.UTF_8))
+// HDFSMetadataLog guarantees that it never creates a partial file.
+assert(content.length != 0)
 
 Review comment:
   We want to assert but `require` would be better as `assert` may be just 
ignored under JVM option. I've just changed to `require(content.nonEmpty)`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25409: [SPARK-28414][WEBUI] UI updates to show resource info in Standalone

2019-08-26 Thread GitBox

AmplabJenkins commented on issue #25409: [SPARK-28414][WEBUI] UI updates to 
show resource info in Standalone
URL: https://github.com/apache/spark/pull/25409#issuecomment-524895177
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25409: [SPARK-28414][WEBUI] UI updates to show resource info in Standalone

2019-08-26 Thread GitBox

AmplabJenkins commented on issue #25409: [SPARK-28414][WEBUI] UI updates to 
show resource info in Standalone
URL: https://github.com/apache/spark/pull/25409#issuecomment-524895186
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14787/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Ngone51 commented on a change in pull request #25409: [SPARK-28414][WEBUI] UI updates to show resource info in Standalone

2019-08-26 Thread GitBox

Ngone51 commented on a change in pull request #25409: [SPARK-28414][WEBUI] UI 
updates to show resource info in Standalone
URL: https://github.com/apache/spark/pull/25409#discussion_r317644596
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala
 ##
 @@ -242,6 +243,22 @@ private[deploy] class Worker(
   System.exit(1)
 }
 }
+resources.keys.foreach { rName =>
+  resourcesUsed(rName) = MutableResourceInfo(rName, new HashSet[String])
+}
+  }
+
+  private def updateResourcesUsed(deltaInfo: Map[String, ResourceInformation], 
add: Boolean)
 
 Review comment:
   good point!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a change in pull request #25582: [SPARK-28875][DSTREAMS][SS][TESTS] Add Task retry tests to make sure new consumer used

2019-08-26 Thread GitBox

HeartSaVioR commented on a change in pull request #25582: 
[SPARK-28875][DSTREAMS][SS][TESTS] Add Task retry tests to make sure new 
consumer used
URL: https://github.com/apache/spark/pull/25582#discussion_r317642200
 
 

 ##
 File path: 
external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaDataConsumerSuite.scala
 ##
 @@ -60,23 +70,40 @@ class KafkaDataConsumerSuite extends SharedSparkSession 
with PrivateMethodTester
 assert(e.getCause === cause)
   }
 
+  test("new KafkaDataConsumer instance in case of Task retry") {
+try {
+  KafkaDataConsumer.cache.clear()
+
+  val kafkaParams = getKafkaParams()
+  val key = new CacheKey(groupId, topicPartition)
+
+  val context1 = new TaskContextImpl(0, 0, 0, 0, 0, null, null, null)
+  TaskContext.setTaskContext(context1)
+  val consumer1 = KafkaDataConsumer.acquire(topicPartition, kafkaParams, 
true)
+  consumer1.release()
+
+  assert(KafkaDataConsumer.cache.size() == 1)
+  assert(KafkaDataConsumer.cache.get(key).eq(consumer1.internalConsumer))
+
+  val context2 = new TaskContextImpl(0, 0, 0, 0, 1, null, null, null)
+  TaskContext.setTaskContext(context2)
+  val consumer2 = KafkaDataConsumer.acquire(topicPartition, kafkaParams, 
true)
+  consumer2.release()
+
+  // The first consumer should be removed from cache and new non-cached 
should be returned
 
 Review comment:
   I'd say consumer2 should be cached as it's created after invalidation, but 
here you only address test so that's OK.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a change in pull request #25582: [SPARK-28875][DSTREAMS][SS][TESTS] Add Task retry tests to make sure new consumer used

2019-08-26 Thread GitBox

HeartSaVioR commented on a change in pull request #25582: 
[SPARK-28875][DSTREAMS][SS][TESTS] Add Task retry tests to make sure new 
consumer used
URL: https://github.com/apache/spark/pull/25582#discussion_r317643184
 
 

 ##
 File path: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala
 ##
 @@ -78,7 +78,7 @@ private[kafka010] sealed trait KafkaDataConsumer {
   def release(): Unit
 
   /** Reference to the internal implementation that this wrapper delegates to 
*/
-  protected def internalConsumer: InternalKafkaConsumer
+  def internalConsumer: InternalKafkaConsumer
 
 Review comment:
   That's technically `private[kafka010]` as class scope so seems OK.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25581: [SPARK-28495][SQL] Introduce ANSI store assignment policy for table insertion

2019-08-26 Thread GitBox

SparkQA commented on issue #25581: [SPARK-28495][SQL] Introduce ANSI store 
assignment policy for table insertion
URL: https://github.com/apache/spark/pull/25581#issuecomment-524886240
 
 
   **[Test build #109739 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109739/testReport)**
 for PR 25581 at commit 
[`68da9cc`](https://github.com/apache/spark/commit/68da9cca6030ed2bc248ccbb6d7f8731f0bae59d).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25581: [SPARK-28495][SQL] Introduce ANSI store assignment policy for table insertion

2019-08-26 Thread GitBox

AmplabJenkins removed a comment on issue #25581: [SPARK-28495][SQL] Introduce 
ANSI store assignment policy for table insertion
URL: https://github.com/apache/spark/pull/25581#issuecomment-524885423
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25581: [SPARK-28495][SQL] Introduce ANSI store assignment policy for table insertion

2019-08-26 Thread GitBox

AmplabJenkins removed a comment on issue #25581: [SPARK-28495][SQL] Introduce 
ANSI store assignment policy for table insertion
URL: https://github.com/apache/spark/pull/25581#issuecomment-524885431
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14786/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25581: [SPARK-28495][SQL] Introduce ANSI store assignment policy for table insertion

2019-08-26 Thread GitBox

AmplabJenkins commented on issue #25581: [SPARK-28495][SQL] Introduce ANSI 
store assignment policy for table insertion
URL: https://github.com/apache/spark/pull/25581#issuecomment-524885431
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14786/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25581: [SPARK-28495][SQL] Introduce ANSI store assignment policy for table insertion

2019-08-26 Thread GitBox

AmplabJenkins commented on issue #25581: [SPARK-28495][SQL] Introduce ANSI 
store assignment policy for table insertion
URL: https://github.com/apache/spark/pull/25581#issuecomment-524885423
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25583: [MINOR][SS] Reuse KafkaSourceInitialOffsetWriter to deduplicate

2019-08-26 Thread GitBox

AmplabJenkins removed a comment on issue #25583: [MINOR][SS] Reuse 
KafkaSourceInitialOffsetWriter to deduplicate
URL: https://github.com/apache/spark/pull/25583#issuecomment-524883602
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/109736/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25583: [MINOR][SS] Reuse KafkaSourceInitialOffsetWriter to deduplicate

2019-08-26 Thread GitBox

AmplabJenkins removed a comment on issue #25583: [MINOR][SS] Reuse 
KafkaSourceInitialOffsetWriter to deduplicate
URL: https://github.com/apache/spark/pull/25583#issuecomment-524883588
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #25583: [MINOR][SS] Reuse KafkaSourceInitialOffsetWriter to deduplicate

2019-08-26 Thread GitBox

SparkQA removed a comment on issue #25583: [MINOR][SS] Reuse 
KafkaSourceInitialOffsetWriter to deduplicate
URL: https://github.com/apache/spark/pull/25583#issuecomment-524873878
 
 
   **[Test build #109736 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109736/testReport)**
 for PR 25583 at commit 
[`1daf77b`](https://github.com/apache/spark/commit/1daf77b5822260a7b5d1add98ba6b09d4151ffb3).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gengliangwang commented on a change in pull request #25581: [SPARK-28495][SQL] Introduce ANSI store assignment policy for table insertion

2019-08-26 Thread GitBox

gengliangwang commented on a change in pull request #25581: [SPARK-28495][SQL] 
Introduce ANSI store assignment policy for table insertion
URL: https://github.com/apache/spark/pull/25581#discussion_r317631925
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala
 ##
 @@ -371,12 +373,14 @@ object DataType {
   byName: Boolean,
   resolver: Resolver,
   context: String,
+  storeAssignmentPolicy: StoreAssignmentPolicy.Value,
 
 Review comment:
   I think keeping the original policy is also fine. Otherwise, it is hard to 
tell that we are using ANSI mode if `isStrict` is `false`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srowen closed pull request #25439: [SPARK-28709][DSTREAMS] Fix StreamingContext leak through Streaming

2019-08-26 Thread GitBox

srowen closed pull request #25439: [SPARK-28709][DSTREAMS] Fix StreamingContext 
leak through Streaming
URL: https://github.com/apache/spark/pull/25439
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srowen commented on issue #25439: [SPARK-28709][DSTREAMS] Fix StreamingContext leak through Streaming

2019-08-26 Thread GitBox

srowen commented on issue #25439: [SPARK-28709][DSTREAMS] Fix StreamingContext 
leak through Streaming
URL: https://github.com/apache/spark/pull/25439#issuecomment-524883593
 
 
   Merged to master


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25583: [MINOR][SS] Reuse KafkaSourceInitialOffsetWriter to deduplicate

2019-08-26 Thread GitBox

AmplabJenkins commented on issue #25583: [MINOR][SS] Reuse 
KafkaSourceInitialOffsetWriter to deduplicate
URL: https://github.com/apache/spark/pull/25583#issuecomment-524883588
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25583: [MINOR][SS] Reuse KafkaSourceInitialOffsetWriter to deduplicate

2019-08-26 Thread GitBox

SparkQA commented on issue #25583: [MINOR][SS] Reuse 
KafkaSourceInitialOffsetWriter to deduplicate
URL: https://github.com/apache/spark/pull/25583#issuecomment-524883440
 
 
   **[Test build #109736 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109736/testReport)**
 for PR 25583 at commit 
[`1daf77b`](https://github.com/apache/spark/commit/1daf77b5822260a7b5d1add98ba6b09d4151ffb3).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25583: [MINOR][SS] Reuse KafkaSourceInitialOffsetWriter to deduplicate

2019-08-26 Thread GitBox

AmplabJenkins commented on issue #25583: [MINOR][SS] Reuse 
KafkaSourceInitialOffsetWriter to deduplicate
URL: https://github.com/apache/spark/pull/25583#issuecomment-524883602
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/109736/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25584: [SPARK-28876][SQL] fallBackToHdfs should not support Hive partitioned table

2019-08-26 Thread GitBox

AmplabJenkins removed a comment on issue #25584: [SPARK-28876][SQL] 
fallBackToHdfs should not support Hive partitioned table
URL: https://github.com/apache/spark/pull/25584#issuecomment-524882062
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14784/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25584: [SPARK-28876][SQL] fallBackToHdfs should not support Hive partitioned table

2019-08-26 Thread GitBox

AmplabJenkins removed a comment on issue #25584: [SPARK-28876][SQL] 
fallBackToHdfs should not support Hive partitioned table
URL: https://github.com/apache/spark/pull/25584#issuecomment-524882056
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25584: [SPARK-28876][SQL] fallBackToHdfs should not support Hive partitioned table

2019-08-26 Thread GitBox

SparkQA commented on issue #25584: [SPARK-28876][SQL] fallBackToHdfs should not 
support Hive partitioned table
URL: https://github.com/apache/spark/pull/25584#issuecomment-524883020
 
 
   **[Test build #109737 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109737/testReport)**
 for PR 25584 at commit 
[`a9eea2a`](https://github.com/apache/spark/commit/a9eea2adbe69bc105b239a1289d6c0d92189e7a2).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25581: [SPARK-28495][SQL] Introduce ANSI store assignment policy for table insertion

2019-08-26 Thread GitBox

SparkQA commented on issue #25581: [SPARK-28495][SQL] Introduce ANSI store 
assignment policy for table insertion
URL: https://github.com/apache/spark/pull/25581#issuecomment-524883025
 
 
   **[Test build #109738 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109738/testReport)**
 for PR 25581 at commit 
[`e2b3754`](https://github.com/apache/spark/commit/e2b37544b4888008ab92c554bb2dfe51ecae4b35).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25581: [SPARK-28495][SQL] Introduce ANSI store assignment policy for table insertion

2019-08-26 Thread GitBox

AmplabJenkins removed a comment on issue #25581: [SPARK-28495][SQL] Introduce 
ANSI store assignment policy for table insertion
URL: https://github.com/apache/spark/pull/25581#issuecomment-524882189
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14785/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25581: [SPARK-28495][SQL] Introduce ANSI store assignment policy for table insertion

2019-08-26 Thread GitBox

AmplabJenkins removed a comment on issue #25581: [SPARK-28495][SQL] Introduce 
ANSI store assignment policy for table insertion
URL: https://github.com/apache/spark/pull/25581#issuecomment-524882174
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25581: [SPARK-28495][SQL] Introduce ANSI store assignment policy for table insertion

2019-08-26 Thread GitBox

AmplabJenkins commented on issue #25581: [SPARK-28495][SQL] Introduce ANSI 
store assignment policy for table insertion
URL: https://github.com/apache/spark/pull/25581#issuecomment-524882189
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14785/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25581: [SPARK-28495][SQL] Introduce ANSI store assignment policy for table insertion

2019-08-26 Thread GitBox

AmplabJenkins commented on issue #25581: [SPARK-28495][SQL] Introduce ANSI 
store assignment policy for table insertion
URL: https://github.com/apache/spark/pull/25581#issuecomment-524882174
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25584: [SPARK-28876][SQL] fallBackToHdfs should not support Hive partitioned table

2019-08-26 Thread GitBox

AmplabJenkins commented on issue #25584: [SPARK-28876][SQL] fallBackToHdfs 
should not support Hive partitioned table
URL: https://github.com/apache/spark/pull/25584#issuecomment-524882062
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14784/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25584: [SPARK-28876][SQL] fallBackToHdfs should not support Hive partitioned table

2019-08-26 Thread GitBox

AmplabJenkins commented on issue #25584: [SPARK-28876][SQL] fallBackToHdfs 
should not support Hive partitioned table
URL: https://github.com/apache/spark/pull/25584#issuecomment-524882056
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] wangyum opened a new pull request #25584: [SPARK-28876][SQL] fallBackToHdfs should not support Hive partitioned table

2019-08-26 Thread GitBox

wangyum opened a new pull request #25584: [SPARK-28876][SQL] fallBackToHdfs 
should not support Hive partitioned table
URL: https://github.com/apache/spark/pull/25584
 
 
   ### What changes were proposed in this pull request?
   
   This PR makes `spark.sql.statistics.fallBackToHdfs` not support Hive 
partitioned tables.
   
   
   ### Why are the changes needed?
   
   The current implementation is incorrect for external partitions and it is 
expensive to support partitioned table with external partitions.
   
   
   ### Does this PR introduce any user-facing change?
   Yes.  But I think it will not change the join strategy because partitioned 
table usually very large.
   
   
   ### How was this patch tested?
   unit test
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srowen commented on a change in pull request #25545: [SPARK-28843][PYTHON] Set OMP_NUM_THREADS to executor cores for python

2019-08-26 Thread GitBox

srowen commented on a change in pull request #25545: [SPARK-28843][PYTHON] Set 
OMP_NUM_THREADS to executor cores for python
URL: https://github.com/apache/spark/pull/25545#discussion_r317627588
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala
 ##
 @@ -106,6 +106,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT](
 val startTime = System.currentTimeMillis
 val env = SparkEnv.get
 val localdir = env.blockManager.diskBlockManager.localDirs.map(f => 
f.getPath()).mkString(",")
+// if OMP_NUM_THREADS is not explicitly set, override it with the number 
of cores
+if (conf.getOption("spark.executorEnv.OMP_NUM_THREADS").isEmpty) {
+  // SPARK-28843: limit the OpenMP thread pool to the number of cores 
assigned to this executor
+  // this avoids high memory consumption with pandas/numpy because of a 
large OpenMP thread pool
+  // see https://github.com/numpy/numpy/issues/10455
+  
conf.getOption("spark.executor.cores").foreach(envVars.put("OMP_NUM_THREADS", 
_))
 
 Review comment:
   1 is just the default in YARN; the meaning is the same everywhere.
   
   Let me break it down further, to illustrate. All this is doing is ensuring 
that a process isn't using more cores than it should, which ought to always be 
a good thing. (And it saves memory along the way.) Suppose there's a 16-core 
machine.
   
   Case 1: JVM Spark
   a) `spark.executor.cores` = 16. There is one JVM using all cores.
   b) `spark.executor.cores` = 4. There are (up to) 4 JVMs using 4 cores each.
   
   Case 2: Pyspark
   a) `spark.executor.cores` = 16. There are 16 Python processes
   b) `spark.executor.cores` = 4. There are still 16 Python processes.
   
   In case 1a, imagine using MLlib that uses OpenBLAS or MKL. By default, 
OpenMP will use all 16 cores now. This is fine, and does not change with this 
change. In case 1b, each JVM will use 16 cores, so OpenMP will attempt to use 
64 total threads (to my understanding here), right now. This change would make 
this use 16 cores (4 x 4). That's better.
   
   In case 2a and 2b, imagine using numpy. 256 threads will be used in total on 
the machine! That's bad; it's a little slower because of all the context 
switching, but also uses more memory. This change does not however help case 
2a. It does help 2b, where 'at least' only 64 threads are started.
   
   The more aggressive change would be to set the default to 1, always, for 
Pyspark as well as this matches the execution better. However, this is at least 
a more conservative step to merely cap it at the number of allocated executor 
cores.
   
   
   Yes, the situation isn't as bad if the executor isn't actually fully 
utilized, but, I don't think we should optimize for that case? at least, this 
more conservative change still errs on the side of over-committing the cores at 
the cost of memory, just not nearly as extremely as the default.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25104: [SPARK-28341][SQL] create a public API for V2SessionCatalog

2019-08-26 Thread GitBox

AmplabJenkins removed a comment on issue #25104: [SPARK-28341][SQL] create a 
public API for V2SessionCatalog
URL: https://github.com/apache/spark/pull/25104#issuecomment-524879679
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/109730/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25104: [SPARK-28341][SQL] create a public API for V2SessionCatalog

2019-08-26 Thread GitBox

AmplabJenkins removed a comment on issue #25104: [SPARK-28341][SQL] create a 
public API for V2SessionCatalog
URL: https://github.com/apache/spark/pull/25104#issuecomment-524879672
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25104: [SPARK-28341][SQL] create a public API for V2SessionCatalog

2019-08-26 Thread GitBox

AmplabJenkins commented on issue #25104: [SPARK-28341][SQL] create a public API 
for V2SessionCatalog
URL: https://github.com/apache/spark/pull/25104#issuecomment-524879672
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25104: [SPARK-28341][SQL] create a public API for V2SessionCatalog

2019-08-26 Thread GitBox

AmplabJenkins commented on issue #25104: [SPARK-28341][SQL] create a public API 
for V2SessionCatalog
URL: https://github.com/apache/spark/pull/25104#issuecomment-524879679
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/109730/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #25104: [SPARK-28341][SQL] create a public API for V2SessionCatalog

2019-08-26 Thread GitBox

SparkQA removed a comment on issue #25104: [SPARK-28341][SQL] create a public 
API for V2SessionCatalog
URL: https://github.com/apache/spark/pull/25104#issuecomment-524824809
 
 
   **[Test build #109730 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109730/testReport)**
 for PR 25104 at commit 
[`e6a18ac`](https://github.com/apache/spark/commit/e6a18accc7cbc84ebf738ad23d4dd3864e8d4de5).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25104: [SPARK-28341][SQL] create a public API for V2SessionCatalog

2019-08-26 Thread GitBox

SparkQA commented on issue #25104: [SPARK-28341][SQL] create a public API for 
V2SessionCatalog
URL: https://github.com/apache/spark/pull/25104#issuecomment-524879351
 
 
   **[Test build #109730 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109730/testReport)**
 for PR 25104 at commit 
[`e6a18ac`](https://github.com/apache/spark/commit/e6a18accc7cbc84ebf738ad23d4dd3864e8d4de5).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `public abstract class CatalogExtension implements TableCatalog `
 * `class CatalogManager(conf: SQLConf, sessionCatalog: TableCatalog) 
extends Logging `
 * `class V2SessionCatalog(catalog: SessionCatalog, conf: SQLConf) extends 
TableCatalog `


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on issue #25347: [SPARK-28610][SQL] Allow having a decimal buffer for long sum

2019-08-26 Thread GitBox

cloud-fan commented on issue #25347: [SPARK-28610][SQL] Allow having a decimal 
buffer for long sum
URL: https://github.com/apache/spark/pull/25347#issuecomment-524876439
 
 
   @mgaido91 can you do a simple microbenchmark? If the performance overhead is 
not significant, we can use decimal as sum buffer and provide a legacy config 
to use long as buffer.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srowen commented on a change in pull request #25583: [MINOR][SS] Reuse KafkaSourceInitialOffsetWriter to deduplicate

2019-08-26 Thread GitBox

srowen commented on a change in pull request #25583: [MINOR][SS] Reuse 
KafkaSourceInitialOffsetWriter to deduplicate
URL: https://github.com/apache/spark/pull/25583#discussion_r317621710
 
 

 ##
 File path: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceInitialOffsetWriter.scala
 ##
 @@ -0,0 +1,63 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.kafka010
+
+import java.io._
+import java.nio.charset.StandardCharsets
+
+import org.apache.commons.io.IOUtils
+
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.execution.streaming.{HDFSMetadataLog, 
SerializedOffset}
+
+/** A version of [[HDFSMetadataLog]] specialized for saving the initial 
offsets. */
+private[kafka010] class KafkaSourceInitialOffsetWriter(
+sparkSession: SparkSession,
+metadataPath: String)
+  extends HDFSMetadataLog[KafkaSourceOffset](sparkSession, metadataPath) {
+
+  val VERSION = 1
+
+  override def serialize(metadata: KafkaSourceOffset, out: OutputStream): Unit 
= {
+out.write(0) // A zero byte is written to support Spark 2.1.0 (SPARK-19517)
+val writer = new BufferedWriter(new OutputStreamWriter(out, 
StandardCharsets.UTF_8))
+writer.write("v" + VERSION + "\n")
 
 Review comment:
   And while we're here you could use interpolation here


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srowen commented on a change in pull request #25583: [MINOR][SS] Reuse KafkaSourceInitialOffsetWriter to deduplicate

2019-08-26 Thread GitBox

srowen commented on a change in pull request #25583: [MINOR][SS] Reuse 
KafkaSourceInitialOffsetWriter to deduplicate
URL: https://github.com/apache/spark/pull/25583#discussion_r317621165
 
 

 ##
 File path: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceInitialOffsetWriter.scala
 ##
 @@ -0,0 +1,63 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.kafka010
+
+import java.io._
+import java.nio.charset.StandardCharsets
+
+import org.apache.commons.io.IOUtils
+
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.execution.streaming.{HDFSMetadataLog, 
SerializedOffset}
+
+/** A version of [[HDFSMetadataLog]] specialized for saving the initial 
offsets. */
+private[kafka010] class KafkaSourceInitialOffsetWriter(
+sparkSession: SparkSession,
+metadataPath: String)
+  extends HDFSMetadataLog[KafkaSourceOffset](sparkSession, metadataPath) {
+
+  val VERSION = 1
+
+  override def serialize(metadata: KafkaSourceOffset, out: OutputStream): Unit 
= {
+out.write(0) // A zero byte is written to support Spark 2.1.0 (SPARK-19517)
+val writer = new BufferedWriter(new OutputStreamWriter(out, 
StandardCharsets.UTF_8))
+writer.write("v" + VERSION + "\n")
+writer.write(metadata.json)
+writer.flush
+  }
+
+  override def deserialize(in: InputStream): KafkaSourceOffset = {
+in.read() // A zero byte is read to support Spark 2.1.0 (SPARK-19517)
+val content = IOUtils.toString(new InputStreamReader(in, 
StandardCharsets.UTF_8))
+// HDFSMetadataLog guarantees that it never creates a partial file.
+assert(content.length != 0)
+if (content(0) == 'v') {
+  val indexOfNewLine = content.indexOf("\n")
+  if (indexOfNewLine > 0) {
+validateVersion(content.substring(0, indexOfNewLine), VERSION)
+KafkaSourceOffset(SerializedOffset(content.substring(indexOfNewLine + 
1)))
+  } else {
+throw new IllegalStateException(
+  s"Log file was malformed: failed to detect the log file version 
line.")
 
 Review comment:
   Nit: no need for interpolation


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] srowen commented on a change in pull request #25583: [MINOR][SS] Reuse KafkaSourceInitialOffsetWriter to deduplicate

2019-08-26 Thread GitBox

srowen commented on a change in pull request #25583: [MINOR][SS] Reuse 
KafkaSourceInitialOffsetWriter to deduplicate
URL: https://github.com/apache/spark/pull/25583#discussion_r317621622
 
 

 ##
 File path: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaSourceInitialOffsetWriter.scala
 ##
 @@ -0,0 +1,63 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.kafka010
+
+import java.io._
+import java.nio.charset.StandardCharsets
+
+import org.apache.commons.io.IOUtils
+
+import org.apache.spark.sql.SparkSession
+import org.apache.spark.sql.execution.streaming.{HDFSMetadataLog, 
SerializedOffset}
+
+/** A version of [[HDFSMetadataLog]] specialized for saving the initial 
offsets. */
+private[kafka010] class KafkaSourceInitialOffsetWriter(
+sparkSession: SparkSession,
+metadataPath: String)
+  extends HDFSMetadataLog[KafkaSourceOffset](sparkSession, metadataPath) {
+
+  val VERSION = 1
+
+  override def serialize(metadata: KafkaSourceOffset, out: OutputStream): Unit 
= {
+out.write(0) // A zero byte is written to support Spark 2.1.0 (SPARK-19517)
+val writer = new BufferedWriter(new OutputStreamWriter(out, 
StandardCharsets.UTF_8))
+writer.write("v" + VERSION + "\n")
+writer.write(metadata.json)
+writer.flush
+  }
+
+  override def deserialize(in: InputStream): KafkaSourceOffset = {
+in.read() // A zero byte is read to support Spark 2.1.0 (SPARK-19517)
+val content = IOUtils.toString(new InputStreamReader(in, 
StandardCharsets.UTF_8))
+// HDFSMetadataLog guarantees that it never creates a partial file.
+assert(content.length != 0)
 
 Review comment:
   Could be .nonEmpty, but I'm not even sure we want asserts here


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25581: [SPARK-28495][SQL] Introduce ANSI store assignment policy for table insertion

2019-08-26 Thread GitBox

cloud-fan commented on a change in pull request #25581: [SPARK-28495][SQL] 
Introduce ANSI store assignment policy for table insertion
URL: https://github.com/apache/spark/pull/25581#discussion_r317619714
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/types/DataType.scala
 ##
 @@ -371,12 +373,14 @@ object DataType {
   byName: Boolean,
   resolver: Resolver,
   context: String,
+  storeAssignmentPolicy: StoreAssignmentPolicy.Value,
 
 Review comment:
   maybe pass in a boolean flag `isStrict`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25581: [SPARK-28495][SQL] Introduce ANSI store assignment policy for table insertion

2019-08-26 Thread GitBox

cloud-fan commented on a change in pull request #25581: [SPARK-28495][SQL] 
Introduce ANSI store assignment policy for table insertion
URL: https://github.com/apache/spark/pull/25581#discussion_r317619345
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 ##
 @@ -158,6 +158,35 @@ object Cast {
 case _ => false
   }
 
+  def canANSIStoreAssign(from: DataType, to: DataType): Boolean = (from, to) 
match {
+case _ if from == to => true
+case (_: NumericType, _: NumericType) => true
+case (_, StringType) => true
 
 Review comment:
   shall we only allow AtomicType to string?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25583: [MINOR][SS] Reuse KafkaSourceInitialOffsetWriter to deduplicate

2019-08-26 Thread GitBox

AmplabJenkins removed a comment on issue #25583: [MINOR][SS] Reuse 
KafkaSourceInitialOffsetWriter to deduplicate
URL: https://github.com/apache/spark/pull/25583#issuecomment-524872836
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25583: [MINOR][SS] Reuse KafkaSourceInitialOffsetWriter to deduplicate

2019-08-26 Thread GitBox

SparkQA commented on issue #25583: [MINOR][SS] Reuse 
KafkaSourceInitialOffsetWriter to deduplicate
URL: https://github.com/apache/spark/pull/25583#issuecomment-524873878
 
 
   **[Test build #109736 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109736/testReport)**
 for PR 25583 at commit 
[`1daf77b`](https://github.com/apache/spark/commit/1daf77b5822260a7b5d1add98ba6b09d4151ffb3).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25583: [MINOR][SS] Reuse KafkaSourceInitialOffsetWriter to deduplicate

2019-08-26 Thread GitBox

AmplabJenkins removed a comment on issue #25583: [MINOR][SS] Reuse 
KafkaSourceInitialOffsetWriter to deduplicate
URL: https://github.com/apache/spark/pull/25583#issuecomment-524872616
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25583: [MINOR][SS] Reuse KafkaSourceInitialOffsetWriter to deduplicate

2019-08-26 Thread GitBox

AmplabJenkins commented on issue #25583: [MINOR][SS] Reuse 
KafkaSourceInitialOffsetWriter to deduplicate
URL: https://github.com/apache/spark/pull/25583#issuecomment-524872836
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25583: [MINOR][SS] Reuse KafkaSourceInitialOffsetWriter to deduplicate

2019-08-26 Thread GitBox

AmplabJenkins commented on issue #25583: [MINOR][SS] Reuse 
KafkaSourceInitialOffsetWriter to deduplicate
URL: https://github.com/apache/spark/pull/25583#issuecomment-524872616
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR opened a new pull request #25583: [MINOR][SS] Reuse KafkaSourceInitialOffsetWriter to deduplicate

2019-08-26 Thread GitBox

HeartSaVioR opened a new pull request #25583: [MINOR][SS] Reuse 
KafkaSourceInitialOffsetWriter to deduplicate
URL: https://github.com/apache/spark/pull/25583
 
 
   ### What changes were proposed in this pull request?
   
   This patch proposes to reuse KafkaSourceInitialOffsetWriter to remove 
identical code in KafkaSource.
   
   Credit to @jaceklaskowski for finding this.
   
https://lists.apache.org/thread.html/7faa6ac29d871444eaeccefc520e3543a77f4362af4bb0f12a3f7cb2@%3Cdev.spark.apache.org%3E
   
   ### Why are the changes needed?
   
   The code is duplicated with identical code, which opens the chance to 
maintain the code separately and might end up with bugs not addressed one side.
   
   ### Does this PR introduce any user-facing change?
   
   No
   
   ### How was this patch tested?
   
   Existing UTs, as it's simple refactor.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] attilapiros commented on a change in pull request #25299: [SPARK-27651][Core] Avoid the network when shuffle blocks are fetched from the same host

2019-08-26 Thread GitBox

attilapiros commented on a change in pull request #25299: [SPARK-27651][Core] 
Avoid the network when shuffle blocks are fetched from the same host
URL: https://github.com/apache/spark/pull/25299#discussion_r317614716
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/storage/BlockManager.scala
 ##
 @@ -995,6 +1003,19 @@ private[spark] class BlockManager(
 None
   }
 
+  private[spark] def getHostLocalDirs(executorIds: Array[String])
+  : scala.collection.Map[String, Array[String]] = {
+val cachedItems = 
executorIdToLocalDirsCache.filterKeys(executorIds.contains(_))
+if (cachedItems.size < executorIds.length) {
+  val notCachedItems = master
+
.getHostLocalDirs(executorIds.filterNot(executorIdToLocalDirsCache.contains))
+  executorIdToLocalDirsCache ++= notCachedItems
 
 Review comment:
   Yes, it is unbounded but this case is the very same what we have at the 
external shuffle service: here only those executors are stored which is/was 
running on the same hosts. I assume this number is quite small but we can use 
the same limit here if you think it still worth to do that.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] attilapiros commented on a change in pull request #25299: [SPARK-27651][Core] Avoid the network when shuffle blocks are fetched from the same host

2019-08-26 Thread GitBox

attilapiros commented on a change in pull request #25299: [SPARK-27651][Core] 
Avoid the network when shuffle blocks are fetched from the same host
URL: https://github.com/apache/spark/pull/25299#discussion_r317614594
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/storage/BlockManager.scala
 ##
 @@ -995,6 +1003,19 @@ private[spark] class BlockManager(
 None
   }
 
+  private[spark] def getHostLocalDirs(executorIds: Array[String])
+  : scala.collection.Map[String, Array[String]] = {
 
 Review comment:
   The predef `Map` is the immutable one. With just using `Map` as return type 
I would got:
   ```
   Error:(1012, 19) type mismatch;
found   : Map[String,Array[String]] (in scala.collection) 
required: Map[String,Array[String]] (in scala.collection.immutable) 
 cachedItems ++ notCachedItems
    
   
   So I decided to avoid unnecessary conversion (copy of items)  I would use 
the the `Map` trait. See `toMap` implementation:
   
   ```
   def toMap[T, U](implicit ev: A <:< (T, U)): immutable.Map[T, U] = {
   val b = immutable.Map.newBuilder[T, U]
   for (x <- self)
 b += x
   
   b.result()
 }
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25403: [SPARK-28679][YARN] changes to setResourceInformation to handle empty resources and reflection error handling

2019-08-26 Thread GitBox

AmplabJenkins removed a comment on issue #25403: [SPARK-28679][YARN] changes to 
setResourceInformation to handle empty resources and reflection error handling
URL: https://github.com/apache/spark/pull/25403#issuecomment-524868589
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/109735/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #25403: [SPARK-28679][YARN] changes to setResourceInformation to handle empty resources and reflection error handling

2019-08-26 Thread GitBox

SparkQA removed a comment on issue #25403: [SPARK-28679][YARN] changes to 
setResourceInformation to handle empty resources and reflection error handling
URL: https://github.com/apache/spark/pull/25403#issuecomment-524861704
 
 
   **[Test build #109735 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109735/testReport)**
 for PR 25403 at commit 
[`298f89b`](https://github.com/apache/spark/commit/298f89b24768b2776a04309e641cb2758686234e).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25403: [SPARK-28679][YARN] changes to setResourceInformation to handle empty resources and reflection error handling

2019-08-26 Thread GitBox

AmplabJenkins removed a comment on issue #25403: [SPARK-28679][YARN] changes to 
setResourceInformation to handle empty resources and reflection error handling
URL: https://github.com/apache/spark/pull/25403#issuecomment-524868574
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25403: [SPARK-28679][YARN] changes to setResourceInformation to handle empty resources and reflection error handling

2019-08-26 Thread GitBox

AmplabJenkins commented on issue #25403: [SPARK-28679][YARN] changes to 
setResourceInformation to handle empty resources and reflection error handling
URL: https://github.com/apache/spark/pull/25403#issuecomment-524868574
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25403: [SPARK-28679][YARN] changes to setResourceInformation to handle empty resources and reflection error handling

2019-08-26 Thread GitBox

AmplabJenkins commented on issue #25403: [SPARK-28679][YARN] changes to 
setResourceInformation to handle empty resources and reflection error handling
URL: https://github.com/apache/spark/pull/25403#issuecomment-524868589
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/109735/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25403: [SPARK-28679][YARN] changes to setResourceInformation to handle empty resources and reflection error handling

2019-08-26 Thread GitBox

SparkQA commented on issue #25403: [SPARK-28679][YARN] changes to 
setResourceInformation to handle empty resources and reflection error handling
URL: https://github.com/apache/spark/pull/25403#issuecomment-524868447
 
 
   **[Test build #109735 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109735/testReport)**
 for PR 25403 at commit 
[`298f89b`](https://github.com/apache/spark/commit/298f89b24768b2776a04309e641cb2758686234e).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a change in pull request #22138: [SPARK-25151][SS] Apply Apache Commons Pool to KafkaDataConsumer

2019-08-26 Thread GitBox

HeartSaVioR commented on a change in pull request #22138: [SPARK-25151][SS] 
Apply Apache Commons Pool to KafkaDataConsumer
URL: https://github.com/apache/spark/pull/22138#discussion_r317612424
 
 

 ##
 File path: 
external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaDataConsumerSuite.scala
 ##
 @@ -121,4 +125,158 @@ class KafkaDataConsumerSuite extends SharedSparkSession 
with PrivateMethodTester
   threadpool.shutdown()
 }
   }
+
+  test("SPARK-25151 Handles multiple tasks in executor fetching same (topic, 
partition) pair") {
 
 Review comment:
   Hmm... that's actually dealing with the limitation of current pool, and I 
guess other reviewer was pointing out this issue so I feel we've indicated this 
as soft kind of bug. Don't mind whether removing JIRA number or not.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya edited a comment on issue #25570: [SPARK-23519][SQL] create view should work from query with duplicate output columns

2019-08-26 Thread GitBox

viirya edited a comment on issue #25570: [SPARK-23519][SQL] create view should
work from query with duplicate output columns
URL: https://github.com/apache/spark/pull/25570#issuecomment-524844659

hmm I am not sure if ALTER VIEW AS should take the schema of the new query,
or it should keep original schema, and just replace old query?

That's right, you described is what ALTER VIEW AS works now. Hive's document
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterViewAsSelect,
doesn't talk the details, just says it works as CREATE OR REPLACE VIEW. But I
can't find Hive doc of CREATE OR REPLACE VIEW.

The comment `Nothing we need to retain from the old view...`, is specific
for CREATE OR REPLACE VIEW. I think it does not necessarily indicate how ALTER
VIEW AS works.

Postgresql has CREATE OR REPLACE VIEW, but interestingly, the new query in
CREATE OR REPLACE VIEW must generate same column names in the same order and
data types. No ALTER VIEW AS in Postgresql.

I thought ALTER VIEW AS only replaces old query, and keeps original schema.
But seems it isn't how ALTER VIEW AS works now.

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a change in pull request #22138: [SPARK-25151][SS] Apply Apache Commons Pool to KafkaDataConsumer

2019-08-26 Thread GitBox

HeartSaVioR commented on a change in pull request #22138: [SPARK-25151][SS] 
Apply Apache Commons Pool to KafkaDataConsumer
URL: https://github.com/apache/spark/pull/22138#discussion_r317610953
 
 

 ##
 File path: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala
 ##
 @@ -445,197 +529,68 @@ private[kafka010] case class InternalKafkaConsumer(
* Throw an exception or log a warning as per `failOnDataLoss`.
*/
   private def reportDataLoss(
+  topicPartition: TopicPartition,
+  groupId: String,
   failOnDataLoss: Boolean,
   message: String,
   cause: Throwable = null): Unit = {
-val finalMessage = s"$message ${additionalMessage(failOnDataLoss)}"
+val finalMessage = s"$message ${additionalMessage(topicPartition, groupId, 
failOnDataLoss)}"
 reportDataLoss0(failOnDataLoss, finalMessage, cause)
   }
 
-  def close(): Unit = consumer.close()
-
-  private def seek(offset: Long): Unit = {
-logDebug(s"Seeking to $groupId $topicPartition $offset")
-consumer.seek(topicPartition, offset)
-  }
-
-  /**
-   * Poll messages from Kafka starting from `offset` and update `fetchedData`. 
`fetchedData` may be
-   * empty if the Kafka consumer fetches some messages but all of them are not 
visible messages
-   * (either transaction messages, or aborted messages when `isolation.level` 
is `read_committed`).
-   *
-   * @throws OffsetOutOfRangeException if `offset` is out of range.
-   * @throws TimeoutException if the consumer position is not changed after 
polling. It means the
-   *  consumer polls nothing before timeout.
-   */
-  private def fetchData(offset: Long, pollTimeoutMs: Long): Unit = {
-// Seek to the offset because we may call seekToBeginning or seekToEnd 
before this.
-seek(offset)
-val p = consumer.poll(pollTimeoutMs)
-val r = p.records(topicPartition)
-logDebug(s"Polled $groupId ${p.partitions()}  ${r.size}")
-val offsetAfterPoll = consumer.position(topicPartition)
-logDebug(s"Offset changed from $offset to $offsetAfterPoll after polling")
-fetchedData.withNewPoll(r.listIterator, offsetAfterPoll)
-if (!fetchedData.hasNext) {
-  // We cannot fetch anything after `poll`. Two possible cases:
-  // - `offset` is out of range so that Kafka returns nothing. 
`OffsetOutOfRangeException` will
-  //   be thrown.
-  // - Cannot fetch any data before timeout. `TimeoutException` will be 
thrown.
-  // - Fetched something but all of them are not invisible. This is a 
valid case and let the
-  //   caller handles this.
-  val range = getAvailableOffsetRange()
-  if (offset < range.earliest || offset >= range.latest) {
-throw new OffsetOutOfRangeException(
-  Map(topicPartition -> java.lang.Long.valueOf(offset)).asJava)
-  } else if (offset == offsetAfterPoll) {
-throw new TimeoutException(
-  s"Cannot fetch record for offset $offset in $pollTimeoutMs 
milliseconds")
-  }
-}
+  private def runUninterruptiblyIfPossible[T](body: => T): T = 
Thread.currentThread match {
+case ut: UninterruptibleThread =>
+  ut.runUninterruptibly(body)
+case _ =>
+  logWarning("KafkaDataConsumer is not running in UninterruptibleThread. " 
+
+"It may hang when KafkaDataConsumer's methods are interrupted because 
of KAFKA-1894")
+  body
   }
 }
 
-
 private[kafka010] object KafkaDataConsumer extends Logging {
+  val UNKNOWN_OFFSET = -2L
 
   case class AvailableOffsetRange(earliest: Long, latest: Long)
 
-  private case class CachedKafkaDataConsumer(internalConsumer: 
InternalKafkaConsumer)
-extends KafkaDataConsumer {
-assert(internalConsumer.inUse) // make sure this has been set to true
-override def release(): Unit = { 
KafkaDataConsumer.release(internalConsumer) }
-  }
-
-  private case class NonCachedKafkaDataConsumer(internalConsumer: 
InternalKafkaConsumer)
-extends KafkaDataConsumer {
-override def release(): Unit = { internalConsumer.close() }
-  }
-
-  private case class CacheKey(groupId: String, topicPartition: TopicPartition) 
{
+  case class CacheKey(groupId: String, topicPartition: TopicPartition) {
 def this(topicPartition: TopicPartition, kafkaParams: ju.Map[String, 
Object]) =
   
this(kafkaParams.get(ConsumerConfig.GROUP_ID_CONFIG).asInstanceOf[String], 
topicPartition)
   }
 
-  // This cache has the following important properties.
-  // - We make a best-effort attempt to maintain the max size of the cache as 
configured capacity.
-  //   The capacity is not guaranteed to be maintained, especially when there 
are more active
-  //   tasks simultaneously using consumers than the capacity.
-  private lazy val cache = {
-val conf = SparkEnv.get.conf
-val capacity = conf.get(CONSUMER_CACHE_CAPACITY)
-new ju.LinkedHashMap[CacheKey, InternalKafkaConsumer](capacity, 0.75f, 
true) {
-  override def removeEldestEntry(
-

[GitHub] [spark] viirya edited a comment on issue #25570: [SPARK-23519][SQL] create view should work from query with duplicate output columns

2019-08-26 Thread GitBox

viirya edited a comment on issue #25570: [SPARK-23519][SQL] create view should 
work from query with duplicate output columns
URL: https://github.com/apache/spark/pull/25570#issuecomment-524844659
 
 
   hmm I am not sure if ALTER VIEW AS should take the schema of the new query, 
or it should keep original schema, and just replace old query?
   
   That's right, you described is what ALTER VIEW AS works now. Hive's document 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterViewAsSelect,
 doesn't talk the details, just says it works as CREATE OR REPLACE VIEW. But I 
can't find Hive doc of CREATE OR REPLACE VIEW. I assume it works like Spark SQL.
   
   Postgresql has CREATE OR REPLACE VIEW, but interestingly, the new query in 
CREATE OR REPLACE VIEW must generate same column names in the same order and 
data types. No ALTER VIEW AS in Postgresql.
   
   I thought ALTER VIEW AS only replaces old query, and keeps original schema. 
But seems it isn't how ALTER VIEW AS works now.
   
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #25582: [SPARK-28875][DSTREAMS][SS][TESTS] Add Task retry tests to make sure new consumer used

2019-08-26 Thread GitBox

SparkQA removed a comment on issue #25582: [SPARK-28875][DSTREAMS][SS][TESTS] 
Add Task retry tests to make sure new consumer used
URL: https://github.com/apache/spark/pull/25582#issuecomment-524855919
 
 
   **[Test build #109734 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109734/testReport)**
 for PR 25582 at commit 
[`d6602ec`](https://github.com/apache/spark/commit/d6602ece506469bc706f74d8b0e4eda4dc4d229b).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25582: [SPARK-28875][DSTREAMS][SS][TESTS] Add Task retry tests to make sure new consumer used

2019-08-26 Thread GitBox

AmplabJenkins removed a comment on issue #25582: 
[SPARK-28875][DSTREAMS][SS][TESTS] Add Task retry tests to make sure new 
consumer used
URL: https://github.com/apache/spark/pull/25582#issuecomment-524865107
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25582: [SPARK-28875][DSTREAMS][SS][TESTS] Add Task retry tests to make sure new consumer used

2019-08-26 Thread GitBox

AmplabJenkins removed a comment on issue #25582: 
[SPARK-28875][DSTREAMS][SS][TESTS] Add Task retry tests to make sure new 
consumer used
URL: https://github.com/apache/spark/pull/25582#issuecomment-524865113
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/109734/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25582: [WIP][SPARK-28875][DSTREAMS][SS][TESTS] Add Task rety tests to make sure new consumer used

2019-08-26 Thread GitBox

AmplabJenkins commented on issue #25582: 
[WIP][SPARK-28875][DSTREAMS][SS][TESTS] Add Task rety tests to make sure new 
consumer used
URL: https://github.com/apache/spark/pull/25582#issuecomment-524865107
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #25582: [WIP][SPARK-28875][DSTREAMS][SS][TESTS] Add Task rety tests to make sure new consumer used

2019-08-26 Thread GitBox

AmplabJenkins commented on issue #25582: 
[WIP][SPARK-28875][DSTREAMS][SS][TESTS] Add Task rety tests to make sure new 
consumer used
URL: https://github.com/apache/spark/pull/25582#issuecomment-524865113
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/109734/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #25582: [WIP][SPARK-28875][DSTREAMS][SS][TESTS] Add Task rety tests to make sure new consumer used

2019-08-26 Thread GitBox

SparkQA commented on issue #25582: [WIP][SPARK-28875][DSTREAMS][SS][TESTS] Add 
Task rety tests to make sure new consumer used
URL: https://github.com/apache/spark/pull/25582#issuecomment-524865022
 
 
   **[Test build #109734 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/109734/testReport)**
 for PR 25582 at commit 
[`d6602ec`](https://github.com/apache/spark/commit/d6602ece506469bc706f74d8b0e4eda4dc4d229b).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #25507: [SPARK-28667][SQL] Support InsertInto through the V2SessionCatalog

2019-08-26 Thread GitBox

cloud-fan commented on a change in pull request #25507: [SPARK-28667][SQL] 
Support InsertInto through the V2SessionCatalog 
URL: https://github.com/apache/spark/pull/25507#discussion_r317608095
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/sources/v2/V1WriteFallbackSuite.scala
 ##
 @@ -68,13 +69,25 @@ class V1WriteFallbackSuite extends QueryTest with 
SharedSparkSession with Before
 }
 
 class V1WriteFallbackSessionCatalogSuite
-  extends SessionCatalogTest[InMemoryTableWithV1Fallback, 
V1FallbackTableCatalog] {
+  extends InsertIntoTests(supportsDynamicOverwrite = false, includeSQLTests = 
true)
+  with SessionCatalogTest[InMemoryTableWithV1Fallback, V1FallbackTableCatalog] 
{
+
   override protected val v2Format = classOf[InMemoryV1Provider].getName
   override protected val catalogClassName: String = 
classOf[V1FallbackTableCatalog].getName
+  override protected val catalogAndNamespace: String = ""
 
   override protected def verifyTable(tableName: String, expected: DataFrame): 
Unit = {
 checkAnswer(InMemoryV1Provider.getTableData(spark, s"default.$tableName"), 
expected)
   }
+
+  protected def doInsert(tableName: String, insert: DataFrame, mode: 
SaveMode): Unit = {
+val tmpView = "tmp_view"
+withTempView(tmpView) {
+  insert.createOrReplaceTempView(tmpView)
+  val overwrite = if (mode == SaveMode.Overwrite) "OVERWRITE" else "INTO"
+  sql(s"INSERT $overwrite TABLE $tableName SELECT * FROM $tmpView")
 
 Review comment:
   ok seems we do want to run SQL test cases only here. Shouldn't we extend 
`InsertIntoSQLTests` instead of `InsertIntoTests`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on a change in pull request #22138: [SPARK-25151][SS] Apply Apache Commons Pool to KafkaDataConsumer

2019-08-26 Thread GitBox

HeartSaVioR commented on a change in pull request #22138: [SPARK-25151][SS] 
Apply Apache Commons Pool to KafkaDataConsumer
URL: https://github.com/apache/spark/pull/22138#discussion_r317608101
 
 

 ##
 File path: 
external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala
 ##
 @@ -18,228 +18,253 @@
 package org.apache.spark.sql.kafka010
 
 import java.{util => ju}
+import java.io.Closeable
 import java.util.concurrent.TimeoutException
 
 import scala.collection.JavaConverters._
 
 import org.apache.kafka.clients.consumer.{ConsumerConfig, ConsumerRecord, 
KafkaConsumer, OffsetOutOfRangeException}
 import org.apache.kafka.common.TopicPartition
 
-import org.apache.spark.{SparkEnv, SparkException, TaskContext}
+import org.apache.spark.TaskContext
 import org.apache.spark.internal.Logging
 import org.apache.spark.kafka010.KafkaConfigUpdater
-import org.apache.spark.sql.kafka010.KafkaDataConsumer.AvailableOffsetRange
+import org.apache.spark.sql.kafka010.KafkaDataConsumer.{AvailableOffsetRange, 
UNKNOWN_OFFSET}
 import org.apache.spark.sql.kafka010.KafkaSourceProvider._
-import org.apache.spark.util.UninterruptibleThread
+import org.apache.spark.util.{ShutdownHookManager, UninterruptibleThread}
+
+/**
+ * This class simplifies the usages of Kafka consumer in Spark SQL Kafka 
connector.
+ *
+ * NOTE: Like KafkaConsumer, this class is not thread-safe.
+ * NOTE for contributors: It is possible for the instance to be used from 
multiple callers,
+ * so all the methods should not rely on current cursor and use seek manually.
+ */
+private[kafka010] class InternalKafkaConsumer(
+val topicPartition: TopicPartition,
+val kafkaParams: ju.Map[String, Object]) extends Closeable with Logging {
+
+  val groupId = 
kafkaParams.get(ConsumerConfig.GROUP_ID_CONFIG).asInstanceOf[String]
+
+  private val consumer = createConsumer
 
-private[kafka010] sealed trait KafkaDataConsumer {
   /**
-   * Get the record for the given offset if available.
-   *
-   * If the record is invisible (either a
-   * transaction message, or an aborted message when the consumer's 
`isolation.level` is
-   * `read_committed`), it will be skipped and this method will try to fetch 
next available record
-   * within [offset, untilOffset).
+   * Poll messages from Kafka starting from `offset` and returns a pair of 
"list of consumer record"
+   * and "offset after poll". The list of consumer record may be empty if the 
Kafka consumer fetches
+   * some messages but all of them are not visible messages (either 
transaction messages,
+   * or aborted messages when `isolation.level` is `read_committed`).
*
-   * This method also will try its best to detect data loss. If 
`failOnDataLoss` is `true`, it will
-   * throw an exception when we detect an unavailable offset. If 
`failOnDataLoss` is `false`, this
-   * method will try to fetch next available record within [offset, 
untilOffset).
-   *
-   * When this method tries to skip offsets due to either invisible messages 
or data loss and
-   * reaches `untilOffset`, it will return `null`.
-   *
-   * @param offset the offset to fetch.
-   * @param untilOffsetthe max offset to fetch. Exclusive.
-   * @param pollTimeoutMs  timeout in milliseconds to poll data from Kafka.
-   * @param failOnDataLoss When `failOnDataLoss` is `true`, this method will 
either return record at
-   *   offset if available, or throw exception.when 
`failOnDataLoss` is `false`,
-   *   this method will either return record at offset if 
available, or return
-   *   the next earliest available record less than 
untilOffset, or null. It
-   *   will not throw any exception.
+   * @throws OffsetOutOfRangeException if `offset` is out of range.
+   * @throws TimeoutException if the consumer position is not changed after 
polling. It means the
+   *  consumer polls nothing before timeout.
*/
-  def get(
-  offset: Long,
-  untilOffset: Long,
-  pollTimeoutMs: Long,
-  failOnDataLoss: Boolean): ConsumerRecord[Array[Byte], Array[Byte]] = {
-internalConsumer.get(offset, untilOffset, pollTimeoutMs, failOnDataLoss)
+  def fetch(offset: Long, pollTimeoutMs: Long)
+  : (ju.List[ConsumerRecord[Array[Byte], Array[Byte]]], Long) = {
+// Seek to the offset because we may call seekToBeginning or seekToEnd 
before this.
+seek(offset)
+val p = consumer.poll(pollTimeoutMs)
+val r = p.records(topicPartition)
+logDebug(s"Polled $groupId ${p.partitions()}  ${r.size}")
+val offsetAfterPoll = consumer.position(topicPartition)
+logDebug(s"Offset changed from $offset to $offsetAfterPoll after polling")
+val fetchedData = (r, offsetAfterPoll)
+if (r.isEmpty) {
+  // We cannot fetch anything after `poll`. Two possible cases:
+  // - `offset` is out of range so that Kafka returns nothing. 
`OffsetOutOfRangeException` wi

[GitHub] [spark] HyukjinKwon closed pull request #25567: [SPARK-28527][SQL][TEST] Re-run all the tests in SQLQueryTestSuite via Thrift Server

2019-08-26 Thread GitBox

HyukjinKwon closed pull request #25567: [SPARK-28527][SQL][TEST] Re-run all the 
tests in SQLQueryTestSuite via Thrift Server
URL: https://github.com/apache/spark/pull/25567
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 3 4 5 6 7 8 9 10 11 >

701 - 800 of 1038 matches

Mail list logo