[GitHub] spark pull request: [SPARK-12705] [SQL] AnalysisException: Sorting...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10678#issuecomment-171217347 **[Test build #49307 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49307/consoleFull)** for PR 10678 at commit [`27fcaa5`](https://github.com/apache/spark/commit/27fcaa5ad6a3b4228ef4fc46b963c1e818d2f5c4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12791][SQL] Simplify CaseWhen by breaki...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10734#issuecomment-171219213 **[Test build #49303 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49303/consoleFull)** for PR 10734 at commit [`a3a9103`](https://github.com/apache/spark/commit/a3a9103371c619877a72ebfba5baa32b6b593399). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10264][Documentation, ML] Added Since a...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/8532#issuecomment-171223121 Close this PR please @tijoparacka if you're not updating it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HotFox] HotFix for scala style in KinesisBack...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/10738#issuecomment-171238791 ping @rxin @davies --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10029#issuecomment-171240572 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10029#issuecomment-171240574 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49305/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12791][SQL] Simplify CaseWhen by breaki...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10734#issuecomment-171242036 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49308/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HotFox] HotFix for scala style in KinesisBack...
Github user dragos commented on the pull request: https://github.com/apache/spark/pull/10738#issuecomment-171249831 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12796] [SQL] Whole stage codegen
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/10735#discussion_r49560601 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/BenchmarkWholeStageCodegen.scala --- @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution + +import org.apache.spark.sql.SQLContext +import org.apache.spark.util.Benchmark +import org.apache.spark.{SparkFunSuite, SparkConf, SparkContext} + +/** + * Benchmark to measure whole stage codegen performance. + * To run this: + * build/sbt sql/test-only *BenchmarkWholeStageCodegen + */ +class BenchmarkWholeStageCodegen extends SparkFunSuite { + val conf = new SparkConf() + val sc = new SparkContext("local[1]", "test-sql-context", conf) + val sqlContext = new SQLContext(sc) + + def testWholeStage(values: Int): Unit = { +val benchmark = new Benchmark("Single Int Column Scan", values) + +benchmark.addCase("Without whole stage codegen") { iter => --- End diff -- Benchmark will run each case 5 times in a row, so I think the order should not matter, or all the results we see should be re-visited. cc @nongli --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-8426] [scheduler] enhance blacklist mec...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8760#issuecomment-171210808 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49294/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-8426] [scheduler] enhance blacklist mec...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8760#issuecomment-171210609 **[Test build #49294 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49294/consoleFull)** for PR 8760 at commit [`25a7c6f`](https://github.com/apache/spark/commit/25a7c6fc329687310f1ae88e56bcd72a5c259b61). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12796] [SQL] Whole stage codegen
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/10735#discussion_r49560296 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/BenchmarkWholeStageCodegen.scala --- @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution + +import org.apache.spark.sql.SQLContext +import org.apache.spark.util.Benchmark +import org.apache.spark.{SparkFunSuite, SparkConf, SparkContext} + +/** + * Benchmark to measure whole stage codegen performance. + * To run this: + * build/sbt sql/test-only *BenchmarkWholeStageCodegen + */ +class BenchmarkWholeStageCodegen extends SparkFunSuite { + val conf = new SparkConf() + val sc = new SparkContext("local[1]", "test-sql-context", conf) + val sqlContext = new SQLContext(sc) + + def testWholeStage(values: Int): Unit = { +val benchmark = new Benchmark("Single Int Column Scan", values) + +benchmark.addCase("Without whole stage codegen") { iter => --- End diff -- we should also consider switching the order and see if the results change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12692][BUILD]Enforce style checking abo...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/10736#issuecomment-171218362 I've merged this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HotFox] HotFix for scala style in KinesisBack...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10738#issuecomment-171250827 **[Test build #49310 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49310/consoleFull)** for PR 10738 at commit [`6460add`](https://github.com/apache/spark/commit/6460adda640c931af7008a462561ab408ded6bf3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HotFox] HotFix for scala style in KinesisBack...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10738#issuecomment-171250952 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49310/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HotFox] HotFix for scala style in KinesisBack...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10738#issuecomment-171250951 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12796] [SQL] Whole stage codegen
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/10735#discussion_r49561357 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/BenchmarkWholeStageCodegen.scala --- @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution + +import org.apache.spark.sql.SQLContext +import org.apache.spark.util.Benchmark +import org.apache.spark.{SparkFunSuite, SparkConf, SparkContext} + +/** + * Benchmark to measure whole stage codegen performance. + * To run this: + * build/sbt sql/test-only *BenchmarkWholeStageCodegen + */ +class BenchmarkWholeStageCodegen extends SparkFunSuite { + val conf = new SparkConf() + val sc = new SparkContext("local[1]", "test-sql-context", conf) + val sqlContext = new SQLContext(sc) + + def testWholeStage(values: Int): Unit = { +val benchmark = new Benchmark("Single Int Column Scan", values) + +benchmark.addCase("Without whole stage codegen") { iter => --- End diff -- I tried it and it didn't make any difference. FYI on my machine with multiple runs, the speedup is around 2.5X. I have less cores but slightly higher frequency. Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12796] [SQL] Whole stage codegen
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/10735#discussion_r49561388 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/BenchmarkWholeStageCodegen.scala --- @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution + +import org.apache.spark.sql.SQLContext +import org.apache.spark.util.Benchmark +import org.apache.spark.{SparkFunSuite, SparkConf, SparkContext} + +/** + * Benchmark to measure whole stage codegen performance. + * To run this: + * build/sbt sql/test-only *BenchmarkWholeStageCodegen --- End diff -- ``` build/sbt "sql/test-only *BenchmarkWholeStageCodegen" ``` (need to add the quotes) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9439] [yarn] External shuffle service r...
Github user johd01 commented on the pull request: https://github.com/apache/spark/pull/7943#issuecomment-171216053 Hi This does not seem to work properly. After the first nodemanager restart: 2016-01-13 08:22:24,472 ERROR shuffle.ExternalShuffleBlockResolver (ExternalShuffleBlockResolver.java:(113)) - error opening leveldb file /var/hadoop/data1/yarn/nm/registeredExecutors.ldb. Creating new file, will not be able to recover state for existing applications org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: /usr/hdp/2.3.2.0-2950/hadoop-yarn/ /var/hadoop/data1/yarn/nm/registeredExecutors.ldb/LOCK: No such file or directory at org.fusesource.leveldbjni.internal.NativeDB.checkStatus(NativeDB.java:200) at org.fusesource.leveldbjni.internal.NativeDB.open(NativeDB.java:218) at org.fusesource.leveldbjni.JniDBFactory.open(JniDBFactory.java:168) at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.(ExternalShuffleBlockResolver.java:100) at org.apache.spark.network.shuffle.ExternalShuffleBlockResolver.(ExternalShuffleBlockResolver.java:81) at org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.(ExternalShuffleBlockHandler.java:56) at org.apache.spark.network.yarn.YarnShuffleService.serviceInit(YarnShuffleService.java:128) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.serviceInit(AuxServices.java:143) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.serviceInit(ContainerManagerImpl.java:245) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:291) at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:537) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:585) 2016-01-13 08:22:24,474 WARN shuffle.ExternalShuffleBlockResolver (ExternalShuffleBlockResolver.java:(123)) - error deleting /var/hadoop/data1/yarn/nm/registeredExecutors.ldb 2016-01-13 08:22:24,474 ERROR yarn.YarnShuffleService (YarnShuffleService.java:serviceInit(130)) - Failed to initialize external shuffle service java.io.IOException: Unable to create state store Workaround: switch back to spark-1.5.2-yarn.shuffle.jar --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12692][BUILD]Enforce style checking abo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10736#issuecomment-171216790 **[Test build #49301 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49301/consoleFull)** for PR 10736 at commit [`44b3698`](https://github.com/apache/spark/commit/44b36981d553a7f56150d6b0a0caa88eca30e1c5). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12692][BUILD]Enforce style checking abo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10736#issuecomment-171217004 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49301/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9844][CORE] File appender race conditio...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/10714#issuecomment-171224237 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12265][Mesos] Spark calls System.exit i...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/10729#issuecomment-171227005 @nraychaudhuri I think this is a step in a good direction, but now if there is an error, how does the process know to exit? it seems like it just continues. The semantics have changed, unless I'm missing something. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HotFox] HotFix for scala style in KinesisBack...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/10738#issuecomment-171239173 ping @marmbrus @liancheng --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12756][SQL] use hash expression in Exch...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10703#issuecomment-171212404 **[Test build #49295 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49295/consoleFull)** for PR 10703 at commit [`1c2d42b`](https://github.com/apache/spark/commit/1c2d42b4f7b426efcfb806414555bed51d98f4e3). * This patch **fails SparkR unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12791][SQL] Simplify CaseWhen by breaki...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/10734#discussion_r49561587 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala --- @@ -183,32 +175,38 @@ case class CaseWhen(branches: Seq[Expression]) extends Expression { boolean ${ev.isNull} = true; ${ctx.javaType(dataType)} ${ev.value} = ${ctx.defaultValue(dataType)}; $cases - $other + $elseCase """ } override def toString: String = { -"CASE" + branches.sliding(2, 2).map { - case Seq(cond, value) => s" WHEN $cond THEN $value" - case Seq(elseValue) => s" ELSE $elseValue" -}.mkString +val cases = branches.map { case (c, v) => s" WHEN $c THEN $v" }.mkString +val elseCase = elseValue.map(" ELSE " + _).getOrElse("") +"CASE" + cases + elseCase + " END" } - override def sql: String = { -val branchesSQL = branches.map(_.sql) -val (cases, maybeElse) = if (branches.length % 2 == 0) { - (branchesSQL, None) -} else { - (branchesSQL.init, Some(branchesSQL.last)) -} + override def sql: String = prettyString --- End diff -- actually this works for case when, but not for all the other expressions yet. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10029#issuecomment-171214203 **[Test build #49300 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49300/consoleFull)** for PR 10029 at commit [`832db06`](https://github.com/apache/spark/commit/832db06a67980d1aa51bb8330ec86dbc1f1a869c). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `abstract class Covariance(left: Expression, right: Expression) extends ImperativeAggregate` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12692][BUILD]Enforce style checking abo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10736#issuecomment-171217002 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/10029#discussion_r49562972 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala --- @@ -0,0 +1,205 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.aggregate + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.util.TypeUtils +import org.apache.spark.sql.types._ + +/** + * Compute the covariance between two expressions. + * When applied on empty data (i.e., count is zero), it returns NULL. + * + */ +abstract class Covariance( +left: Expression, +right: Expression) + extends ImperativeAggregate with Serializable { + override def children: Seq[Expression] = Seq(left, right) + + override def nullable: Boolean = true + + override def dataType: DataType = DoubleType + + override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, DoubleType) + + override def checkInputDataTypes(): TypeCheckResult = { +if (left.dataType.isInstanceOf[DoubleType] && right.dataType.isInstanceOf[DoubleType]) { + TypeCheckResult.TypeCheckSuccess +} else { + TypeCheckResult.TypeCheckFailure( +s"covariance requires that both arguments are double type, " + + s"not (${left.dataType}, ${right.dataType}).") +} + } + + override def aggBufferSchema: StructType = StructType.fromAttributes(aggBufferAttributes) + + override def inputAggBufferAttributes: Seq[AttributeReference] = { +aggBufferAttributes.map(_.newInstance()) + } + + override val aggBufferAttributes: Seq[AttributeReference] = Seq( +AttributeReference("xAvg", DoubleType)(), +AttributeReference("yAvg", DoubleType)(), +AttributeReference("Ck", DoubleType)(), +AttributeReference("count", LongType)()) + + // Local cache of mutableAggBufferOffset(s) that will be used in update and merge + val xAvgOffset = mutableAggBufferOffset + val yAvgOffset = mutableAggBufferOffset + 1 + val CkOffset = mutableAggBufferOffset + 2 + val countOffset = mutableAggBufferOffset + 3 + + // Local cache of inputAggBufferOffset(s) that will be used in update and merge + val inputXAvgOffset = inputAggBufferOffset + val inputYAvgOffset = inputAggBufferOffset + 1 + val inputCkOffset = inputAggBufferOffset + 2 + val inputCountOffset = inputAggBufferOffset + 3 + + override def initialize(buffer: MutableRow): Unit = { +buffer.setDouble(xAvgOffset, 0.0) +buffer.setDouble(yAvgOffset, 0.0) +buffer.setDouble(CkOffset, 0.0) +buffer.setLong(countOffset, 0L) + } + + override def update(buffer: MutableRow, input: InternalRow): Unit = { +val leftEval = left.eval(input) +val rightEval = right.eval(input) + +if (leftEval != null && rightEval != null) { + val x = leftEval.asInstanceOf[Double] + val y = rightEval.asInstanceOf[Double] + + var xAvg = buffer.getDouble(xAvgOffset) + var yAvg = buffer.getDouble(yAvgOffset) + var Ck = buffer.getDouble(CkOffset) + var count = buffer.getLong(countOffset) + + val deltaX = x - xAvg + val deltaY = y - yAvg + count += 1 + xAvg += deltaX / count + yAvg += deltaY / count + Ck += deltaX * (y - yAvg) + + buffer.setDouble(xAvgOffset, xAvg) + buffer.setDouble(yAvgOffset, yAvg) + buffer.setDouble(CkOffset, Ck) + buffer.setLong(countOffset, count) +} + } + + // Merge counters from other partitions. Formula can be found at: + // http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance + override def merge(buffer1:
[GitHub] spark pull request: [SPARK-5493] [core] Add option to impersonate ...
Github user hemshankar commented on the pull request: https://github.com/apache/spark/pull/4405#issuecomment-171229397 What do we mean by plumbing of the user name through the UI --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5493] [core] Add option to impersonate ...
Github user hemshankar commented on the pull request: https://github.com/apache/spark/pull/4405#issuecomment-171233550 I have few doubts about running in client mode and cluster mode. Currently I am using a cloudera hadoop single node cluster (kerberos enabled.) In client mode I use following commands kinit spark-submit --master yarn-client --proxy-user cloudera examples/src/main/python/pi.py This works fine. In cluster mode I use following command (no kinit done and no TGT is present in the cache) spark-submit --principal --keytab --master yarn-cluster examples/src/main/python/pi.py Also works fine. But when I use following command in cluster mode (no kinit done and no TGT is present in the cache) spark-submit --principal --keytab --master yarn-cluster --proxy-user cloudera examples/src/main/python/pi.py throws following error No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt) I guess in cluster mode the spark-submit do not look for TGT in the client machine... it transfers the "keytab" file to the cluster and then starts the spark job. So why does the specifying "--proxy-user" option looks for TGT while submitting in the "yarn-cluster" mode. Am I doing some thing wrong. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12771][SQL] Improve CaseWhen codegen by...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10737#issuecomment-171234319 **[Test build #49304 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49304/consoleFull)** for PR 10737 at commit [`d6f617c`](https://github.com/apache/spark/commit/d6f617c2a21ef1f353219640901d4235f45adda6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HotFox] HotFix for scala style in KinesisBack...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/10738 [HotFox] HotFix for scala style in KinesisBackedBlockRDDSuite You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 hotfix-kinesis Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10738.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10738 commit 6460adda640c931af7008a462561ab408ded6bf3 Author: Liang-Chi HsiehDate: 2016-01-13T10:00:49Z HotFix for scala style. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12543] [SPARK-4226] [SQL] Subquery in e...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10706#issuecomment-171241181 **[Test build #49306 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49306/consoleFull)** for PR 10706 at commit [`d1feebd`](https://github.com/apache/spark/commit/d1feebd276fe620fb162ad94ed3d544b6b075366). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10029#issuecomment-171214447 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10029#issuecomment-171214450 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49300/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12705] [SQL] AnalysisException: Sorting...
Github user gatorsmile commented on the pull request: https://github.com/apache/spark/pull/10678#issuecomment-171215649 Based on the comments of @davies , this update is for avoiding the usage of extra buffer in the recursion function. It should perform faster and make codes simpler. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12705] [SQL] AnalysisException: Sorting...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10678#issuecomment-171217566 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12705] [SQL] AnalysisException: Sorting...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10678#issuecomment-171217567 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49307/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12692][BUILD]Enforce style checking abo...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/10736 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12705] [SQL] AnalysisException: Sorting...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/10678#issuecomment-171218582 @gatorsmile For Subquery/Filter/Join, because they will have the same output as child/children, we should continue to push the attributes through them. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12771][SQL] Improve CaseWhen codegen by...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10737#issuecomment-171234700 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49304/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12771][SQL] Improve CaseWhen codegen by...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10737#issuecomment-171234698 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10029#issuecomment-171236853 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49309/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10029#issuecomment-171236848 **[Test build #49309 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49309/consoleFull)** for PR 10029 at commit [`1806ced`](https://github.com/apache/spark/commit/1806ced8dc7bdd7d5f2909aa80a9700516564c32). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10029#issuecomment-171239983 **[Test build #49305 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49305/consoleFull)** for PR 10029 at commit [`2f643d4`](https://github.com/apache/spark/commit/2f643d414d4ba0723d0c3323ea72246ec8e77706). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HotFox] HotFix for scala style in KinesisBack...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10738#issuecomment-171240740 **[Test build #49310 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49310/consoleFull)** for PR 10738 at commit [`6460add`](https://github.com/apache/spark/commit/6460adda640c931af7008a462561ab408ded6bf3). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10731#issuecomment-171249633 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10731#issuecomment-171249634 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49311/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10731#issuecomment-171249631 **[Test build #49311 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49311/consoleFull)** for PR 10731 at commit [`1cb6752`](https://github.com/apache/spark/commit/1cb675262ed3b90afdff5ffd8b1bfa4965e37df4). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-8426] [scheduler] enhance blacklist mec...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8760#issuecomment-171210806 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12692][BUILD]Enforce style checking abo...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/10736#issuecomment-171210122 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12771][SQL] Improve CaseWhen codegen by...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10737#issuecomment-171211213 **[Test build #49304 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49304/consoleFull)** for PR 10737 at commit [`d6f617c`](https://github.com/apache/spark/commit/d6f617c2a21ef1f353219640901d4235f45adda6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12771][SQL] Improve CaseWhen codegen by...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/10737#issuecomment-171209982 Can you update this once https://github.com/apache/spark/pull/10734 is merged? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12543] [SPARK-4226] [SQL] Subquery in e...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10706#issuecomment-171216187 **[Test build #49306 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49306/consoleFull)** for PR 10706 at commit [`d1feebd`](https://github.com/apache/spark/commit/d1feebd276fe620fb162ad94ed3d544b6b075366). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12705] [SQL] AnalysisException: Sorting...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10678#issuecomment-171217564 **[Test build #49307 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49307/consoleFull)** for PR 10678 at commit [`27fcaa5`](https://github.com/apache/spark/commit/27fcaa5ad6a3b4228ef4fc46b963c1e818d2f5c4). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12791][SQL] Simplify CaseWhen by breaki...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10734#issuecomment-171219390 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49303/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12791][SQL] Simplify CaseWhen by breaki...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10734#issuecomment-171219388 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10029#issuecomment-171236474 **[Test build #49309 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49309/consoleFull)** for PR 10029 at commit [`1806ced`](https://github.com/apache/spark/commit/1806ced8dc7bdd7d5f2909aa80a9700516564c32). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12791][SQL] Simplify CaseWhen by breaki...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10734#issuecomment-171236469 **[Test build #2375 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2375/consoleFull)** for PR 10734 at commit [`e0b21d4`](https://github.com/apache/spark/commit/e0b21d4539339ba1c581c6fc3e45d8282a0eba97). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10029#issuecomment-171236852 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12791][SQL] Simplify CaseWhen by breaki...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10734#issuecomment-171241865 **[Test build #49308 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49308/consoleFull)** for PR 10734 at commit [`e0b21d4`](https://github.com/apache/spark/commit/e0b21d4539339ba1c581c6fc3e45d8282a0eba97). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12791][SQL] Simplify CaseWhen by breaki...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10734#issuecomment-171242035 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10731#issuecomment-171249380 **[Test build #49311 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49311/consoleFull)** for PR 10731 at commit [`1cb6752`](https://github.com/apache/spark/commit/1cb675262ed3b90afdff5ffd8b1bfa4965e37df4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12265][Mesos] Spark calls System.exit i...
Github user dragos commented on the pull request: https://github.com/apache/spark/pull/10729#issuecomment-171255908 @srowen good point, it hangs. But as far as I can see, this is due to a count-down latch that's not released in case of error: ``` "main" #1 prio=5 os_prio=31 tid=0x7f8e53003000 nid=0x1703 waiting on condition [0x70215000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x0007b1b875a8> (a java.util.concurrent.CountDownLatch$Sync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) at org.apache.spark.scheduler.cluster.mesos.MesosSchedulerUtils$class.startScheduler(MesosSchedulerUtils.scala:127) - locked <0x0007b1b6c860> (a org.apache.spark.scheduler.cluster.mesos.CoarseMesosSchedulerBackend) at org.apache.spark.scheduler.cluster.mesos.CoarseMesosSchedulerBackend.startScheduler(CoarseMesosSchedulerBackend.scala:49) at org.apache.spark.scheduler.cluster.mesos.CoarseMesosSchedulerBackend.start(CoarseMesosSchedulerBackend.scala:140) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144) at org.apache.spark.SparkContext.(SparkContext.scala:513) at org.apache.spark.repl.SparkILoop.createSparkContext(SparkILoop.scala:1022) at $line3.$read$$iwC$$iwC.(:15) at $line3.$read$$iwC.(:25) at $line3.$read.(:27) at $line3.$read$.(:31) at $line3.$read$.() at $line3.$eval$.(:7) at $line3.$eval$.() at $line3.$eval.$print() ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12780] Inconsistency returning value of...
Github user yinxusen commented on the pull request: https://github.com/apache/spark/pull/10724#issuecomment-171258926 test it please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12756][SQL] use hash expression in Exch...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10703#issuecomment-171212588 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12756][SQL] use hash expression in Exch...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10703#issuecomment-171212589 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49295/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10029#issuecomment-171214830 **[Test build #49305 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49305/consoleFull)** for PR 10029 at commit [`2f643d4`](https://github.com/apache/spark/commit/2f643d414d4ba0723d0c3323ea72246ec8e77706). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12771][SQL] Improve CaseWhen codegen by...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/10737#issuecomment-171214987 @rxin No problem. I will update this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12796] [SQL] Whole stage codegen
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/10735#discussion_r49561979 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/BenchmarkWholeStageCodegen.scala --- @@ -0,0 +1,60 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution + +import org.apache.spark.sql.SQLContext +import org.apache.spark.util.Benchmark +import org.apache.spark.{SparkFunSuite, SparkConf, SparkContext} + +/** + * Benchmark to measure whole stage codegen performance. + * To run this: + * build/sbt sql/test-only *BenchmarkWholeStageCodegen + */ +class BenchmarkWholeStageCodegen extends SparkFunSuite { + val conf = new SparkConf() + val sc = new SparkContext("local[1]", "test-sql-context", conf) + val sqlContext = new SQLContext(sc) + + def testWholeStage(values: Int): Unit = { +val benchmark = new Benchmark("Single Int Column Scan", values) + +benchmark.addCase("Without whole stage codegen") { iter => --- End diff -- I also got 2.5X for the first run, once increased the number of rows, it became 3.0X, because there are some overhead for both query (catalyst and spark job). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/10029#discussion_r49562260 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala --- @@ -0,0 +1,203 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.aggregate + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.util.TypeUtils +import org.apache.spark.sql.types._ + +/** + * Compute the covariance between two expressions. + * When applied on empty data (i.e., count is zero), it returns NULL. + * + */ +abstract class Covariance(left: Expression, right: Expression) extends ImperativeAggregate +with Serializable { + override def children: Seq[Expression] = Seq(left, right) + + override def nullable: Boolean = true + + override def dataType: DataType = DoubleType + + override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, DoubleType) + + override def checkInputDataTypes(): TypeCheckResult = { +if (left.dataType.isInstanceOf[DoubleType] && right.dataType.isInstanceOf[DoubleType]) { + TypeCheckResult.TypeCheckSuccess +} else { + TypeCheckResult.TypeCheckFailure( +s"covariance requires that both arguments are double type, " + + s"not (${left.dataType}, ${right.dataType}).") +} + } + + override def aggBufferSchema: StructType = StructType.fromAttributes(aggBufferAttributes) + + override def inputAggBufferAttributes: Seq[AttributeReference] = { +aggBufferAttributes.map(_.newInstance()) + } + + override val aggBufferAttributes: Seq[AttributeReference] = Seq( +AttributeReference("xAvg", DoubleType)(), +AttributeReference("yAvg", DoubleType)(), +AttributeReference("Ck", DoubleType)(), +AttributeReference("count", LongType)()) + + // Local cache of mutableAggBufferOffset(s) that will be used in update and merge + val xAvgOffset = mutableAggBufferOffset + val yAvgOffset = mutableAggBufferOffset + 1 + val CkOffset = mutableAggBufferOffset + 2 + val countOffset = mutableAggBufferOffset + 3 + + // Local cache of inputAggBufferOffset(s) that will be used in update and merge + val inputXAvgOffset = inputAggBufferOffset + val inputYAvgOffset = inputAggBufferOffset + 1 + val inputCkOffset = inputAggBufferOffset + 2 + val inputCountOffset = inputAggBufferOffset + 3 + + override def initialize(buffer: MutableRow): Unit = { +buffer.setDouble(xAvgOffset, 0.0) +buffer.setDouble(yAvgOffset, 0.0) +buffer.setDouble(CkOffset, 0.0) +buffer.setLong(countOffset, 0L) + } + + override def update(buffer: MutableRow, input: InternalRow): Unit = { +val leftEval = left.eval(input) +val rightEval = right.eval(input) + +if (leftEval != null && rightEval != null) { + val x = leftEval.asInstanceOf[Double] + val y = rightEval.asInstanceOf[Double] + + var xAvg = buffer.getDouble(xAvgOffset) + var yAvg = buffer.getDouble(yAvgOffset) + var Ck = buffer.getDouble(CkOffset) + var count = buffer.getLong(countOffset) + + val deltaX = x - xAvg + val deltaY = y - yAvg + count += 1 + xAvg += deltaX / count + yAvg += deltaY / count + Ck += deltaX * (y - yAvg) + + buffer.setDouble(xAvgOffset, xAvg) + buffer.setDouble(yAvgOffset, yAvg) + buffer.setDouble(CkOffset, Ck) + buffer.setLong(countOffset, count) +} + } + + // Merge counters from other partitions. Formula can be found at: + // http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance + override def merge(buffer1: MutableRow,
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/10029#discussion_r49562259 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala --- @@ -0,0 +1,203 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.aggregate + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.util.TypeUtils +import org.apache.spark.sql.types._ + +/** + * Compute the covariance between two expressions. + * When applied on empty data (i.e., count is zero), it returns NULL. + * + */ +abstract class Covariance(left: Expression, right: Expression) extends ImperativeAggregate +with Serializable { + override def children: Seq[Expression] = Seq(left, right) + + override def nullable: Boolean = true + + override def dataType: DataType = DoubleType + + override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, DoubleType) + + override def checkInputDataTypes(): TypeCheckResult = { +if (left.dataType.isInstanceOf[DoubleType] && right.dataType.isInstanceOf[DoubleType]) { + TypeCheckResult.TypeCheckSuccess +} else { + TypeCheckResult.TypeCheckFailure( +s"covariance requires that both arguments are double type, " + + s"not (${left.dataType}, ${right.dataType}).") +} + } + + override def aggBufferSchema: StructType = StructType.fromAttributes(aggBufferAttributes) + + override def inputAggBufferAttributes: Seq[AttributeReference] = { +aggBufferAttributes.map(_.newInstance()) + } + + override val aggBufferAttributes: Seq[AttributeReference] = Seq( +AttributeReference("xAvg", DoubleType)(), +AttributeReference("yAvg", DoubleType)(), +AttributeReference("Ck", DoubleType)(), +AttributeReference("count", LongType)()) + + // Local cache of mutableAggBufferOffset(s) that will be used in update and merge + val xAvgOffset = mutableAggBufferOffset + val yAvgOffset = mutableAggBufferOffset + 1 + val CkOffset = mutableAggBufferOffset + 2 + val countOffset = mutableAggBufferOffset + 3 + + // Local cache of inputAggBufferOffset(s) that will be used in update and merge + val inputXAvgOffset = inputAggBufferOffset + val inputYAvgOffset = inputAggBufferOffset + 1 + val inputCkOffset = inputAggBufferOffset + 2 + val inputCountOffset = inputAggBufferOffset + 3 + + override def initialize(buffer: MutableRow): Unit = { +buffer.setDouble(xAvgOffset, 0.0) +buffer.setDouble(yAvgOffset, 0.0) +buffer.setDouble(CkOffset, 0.0) +buffer.setLong(countOffset, 0L) + } + + override def update(buffer: MutableRow, input: InternalRow): Unit = { +val leftEval = left.eval(input) +val rightEval = right.eval(input) + +if (leftEval != null && rightEval != null) { + val x = leftEval.asInstanceOf[Double] + val y = rightEval.asInstanceOf[Double] + + var xAvg = buffer.getDouble(xAvgOffset) + var yAvg = buffer.getDouble(yAvgOffset) + var Ck = buffer.getDouble(CkOffset) + var count = buffer.getLong(countOffset) + + val deltaX = x - xAvg + val deltaY = y - yAvg + count += 1 + xAvg += deltaX / count + yAvg += deltaY / count + Ck += deltaX * (y - yAvg) + + buffer.setDouble(xAvgOffset, xAvg) + buffer.setDouble(yAvgOffset, yAvg) + buffer.setDouble(CkOffset, Ck) + buffer.setLong(countOffset, count) +} + } + + // Merge counters from other partitions. Formula can be found at: + // http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance + override def merge(buffer1: MutableRow,
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/10029#discussion_r49562404 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala --- @@ -0,0 +1,203 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.aggregate + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.util.TypeUtils +import org.apache.spark.sql.types._ + +/** + * Compute the covariance between two expressions. + * When applied on empty data (i.e., count is zero), it returns NULL. + * + */ +abstract class Covariance(left: Expression, right: Expression) extends ImperativeAggregate +with Serializable { + override def children: Seq[Expression] = Seq(left, right) + + override def nullable: Boolean = true + + override def dataType: DataType = DoubleType + + override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, DoubleType) + + override def checkInputDataTypes(): TypeCheckResult = { +if (left.dataType.isInstanceOf[DoubleType] && right.dataType.isInstanceOf[DoubleType]) { + TypeCheckResult.TypeCheckSuccess +} else { + TypeCheckResult.TypeCheckFailure( +s"covariance requires that both arguments are double type, " + + s"not (${left.dataType}, ${right.dataType}).") +} + } + + override def aggBufferSchema: StructType = StructType.fromAttributes(aggBufferAttributes) + + override def inputAggBufferAttributes: Seq[AttributeReference] = { +aggBufferAttributes.map(_.newInstance()) + } + + override val aggBufferAttributes: Seq[AttributeReference] = Seq( +AttributeReference("xAvg", DoubleType)(), +AttributeReference("yAvg", DoubleType)(), +AttributeReference("Ck", DoubleType)(), +AttributeReference("count", LongType)()) + + // Local cache of mutableAggBufferOffset(s) that will be used in update and merge + val xAvgOffset = mutableAggBufferOffset + val yAvgOffset = mutableAggBufferOffset + 1 + val CkOffset = mutableAggBufferOffset + 2 + val countOffset = mutableAggBufferOffset + 3 + + // Local cache of inputAggBufferOffset(s) that will be used in update and merge + val inputXAvgOffset = inputAggBufferOffset + val inputYAvgOffset = inputAggBufferOffset + 1 + val inputCkOffset = inputAggBufferOffset + 2 + val inputCountOffset = inputAggBufferOffset + 3 + + override def initialize(buffer: MutableRow): Unit = { +buffer.setDouble(xAvgOffset, 0.0) +buffer.setDouble(yAvgOffset, 0.0) +buffer.setDouble(CkOffset, 0.0) +buffer.setLong(countOffset, 0L) + } + + override def update(buffer: MutableRow, input: InternalRow): Unit = { +val leftEval = left.eval(input) +val rightEval = right.eval(input) + +if (leftEval != null && rightEval != null) { + val x = leftEval.asInstanceOf[Double] + val y = rightEval.asInstanceOf[Double] + + var xAvg = buffer.getDouble(xAvgOffset) + var yAvg = buffer.getDouble(yAvgOffset) + var Ck = buffer.getDouble(CkOffset) + var count = buffer.getLong(countOffset) + + val deltaX = x - xAvg + val deltaY = y - yAvg + count += 1 + xAvg += deltaX / count + yAvg += deltaY / count + Ck += deltaX * (y - yAvg) + + buffer.setDouble(xAvgOffset, xAvg) + buffer.setDouble(yAvgOffset, yAvg) + buffer.setDouble(CkOffset, Ck) + buffer.setLong(countOffset, count) +} + } + + // Merge counters from other partitions. Formula can be found at: + // http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance + override def merge(buffer1: MutableRow,
[GitHub] spark pull request: [SPARK-12791][SQL] Simplify CaseWhen by breaki...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10734#issuecomment-171215882 **[Test build #2375 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/2375/consoleFull)** for PR 10734 at commit [`e0b21d4`](https://github.com/apache/spark/commit/e0b21d4539339ba1c581c6fc3e45d8282a0eba97). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9297][SQL] Add covar_pop and covar_samp
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/10029#discussion_r49562322 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Covariance.scala --- @@ -0,0 +1,205 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst.expressions.aggregate + +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.analysis.TypeCheckResult +import org.apache.spark.sql.catalyst.expressions._ +import org.apache.spark.sql.catalyst.util.TypeUtils +import org.apache.spark.sql.types._ + +/** + * Compute the covariance between two expressions. + * When applied on empty data (i.e., count is zero), it returns NULL. + * + */ +abstract class Covariance( +left: Expression, +right: Expression) + extends ImperativeAggregate with Serializable { + override def children: Seq[Expression] = Seq(left, right) + + override def nullable: Boolean = true + + override def dataType: DataType = DoubleType + + override def inputTypes: Seq[AbstractDataType] = Seq(DoubleType, DoubleType) + + override def checkInputDataTypes(): TypeCheckResult = { +if (left.dataType.isInstanceOf[DoubleType] && right.dataType.isInstanceOf[DoubleType]) { + TypeCheckResult.TypeCheckSuccess +} else { + TypeCheckResult.TypeCheckFailure( +s"covariance requires that both arguments are double type, " + + s"not (${left.dataType}, ${right.dataType}).") +} + } + + override def aggBufferSchema: StructType = StructType.fromAttributes(aggBufferAttributes) + + override def inputAggBufferAttributes: Seq[AttributeReference] = { +aggBufferAttributes.map(_.newInstance()) + } + + override val aggBufferAttributes: Seq[AttributeReference] = Seq( +AttributeReference("xAvg", DoubleType)(), +AttributeReference("yAvg", DoubleType)(), +AttributeReference("Ck", DoubleType)(), +AttributeReference("count", LongType)()) + + // Local cache of mutableAggBufferOffset(s) that will be used in update and merge + val xAvgOffset = mutableAggBufferOffset + val yAvgOffset = mutableAggBufferOffset + 1 + val CkOffset = mutableAggBufferOffset + 2 + val countOffset = mutableAggBufferOffset + 3 + + // Local cache of inputAggBufferOffset(s) that will be used in update and merge + val inputXAvgOffset = inputAggBufferOffset + val inputYAvgOffset = inputAggBufferOffset + 1 + val inputCkOffset = inputAggBufferOffset + 2 + val inputCountOffset = inputAggBufferOffset + 3 + + override def initialize(buffer: MutableRow): Unit = { +buffer.setDouble(xAvgOffset, 0.0) +buffer.setDouble(yAvgOffset, 0.0) +buffer.setDouble(CkOffset, 0.0) +buffer.setLong(countOffset, 0L) + } + + override def update(buffer: MutableRow, input: InternalRow): Unit = { +val leftEval = left.eval(input) +val rightEval = right.eval(input) + +if (leftEval != null && rightEval != null) { + val x = leftEval.asInstanceOf[Double] + val y = rightEval.asInstanceOf[Double] + + var xAvg = buffer.getDouble(xAvgOffset) + var yAvg = buffer.getDouble(yAvgOffset) + var Ck = buffer.getDouble(CkOffset) + var count = buffer.getLong(countOffset) + + val deltaX = x - xAvg + val deltaY = y - yAvg + count += 1 + xAvg += deltaX / count + yAvg += deltaY / count + Ck += deltaX * (y - yAvg) + + buffer.setDouble(xAvgOffset, xAvg) + buffer.setDouble(yAvgOffset, yAvg) + buffer.setDouble(CkOffset, Ck) + buffer.setLong(countOffset, count) +} + } + + // Merge counters from other partitions. Formula can be found at: + // http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance + override def merge(buffer1:
[GitHub] spark pull request: [SPARK-12705] [SQL] AnalysisException: Sorting...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/10678#discussion_r49563404 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -521,38 +522,90 @@ class Analyzer( */ object ResolveSortReferences extends Rule[LogicalPlan] { def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperators { - case s @ Sort(ordering, global, p @ Project(projectList, child)) - if !s.resolved && p.resolved => -val (newOrdering, missing) = resolveAndFindMissing(ordering, p, child) + case s @ Sort(_, _, child) if !s.resolved && child.resolved => +val (newOrdering, missingResolvableAttrs) = collectResolvedMissingAttrs(s.order, child) -// If this rule was not a no-op, return the transformed plan, otherwise return the original. -if (missing.nonEmpty) { - // Add missing attributes and then project them away after the sort. - Project(p.output, -Sort(newOrdering, global, - Project(projectList ++ missing, child))) -} else { - logDebug(s"Failed to find $missing in ${p.output.mkString(", ")}") +if (missingResolvableAttrs.isEmpty) { + val unresolvableAttrs = s.order.filterNot(_.resolved) + logDebug(s"Failed to find $unresolvableAttrs in ${child.output.mkString(", ")}") s // Nothing we can do here. Return original plan. } +else { + var stop: Boolean = false + var missingAttrs: Seq[Attribute] = missingResolvableAttrs + val newChild = child transform { +case p: Project if !stop && missingAttrs.nonEmpty => + val newList = p.projectList ++ missingAttrs + missingAttrs = missingAttrs.filterNot( +attr => p.child.outputSet.exists(_.semanticEquals(attr))) + p.copy(projectList = newList) +case w: Window if !stop && missingAttrs.nonEmpty => + val newList = w.projectList ++ missingAttrs + missingAttrs = missingAttrs.filterNot( +attr => w.child.outputSet.exists(_.semanticEquals(attr))) + w.copy(projectList = newList) +case a: Aggregate if !stop && missingAttrs.nonEmpty => + // Grouping expressions could already have the missing attributes. + // Do not add the duplicate attributes. + val newGroupExpressions = a.groupingExpressions ++ missingAttrs.filterNot( +attr => a.groupingExpressions.exists(_.semanticEquals(attr))) + val newAggregateExpressions = a.aggregateExpressions ++ missingAttrs + missingAttrs = missingAttrs.filterNot( +attr => a.child.outputSet.exists(_.semanticEquals(attr))) + a.copy(groupingExpressions = newGroupExpressions, +aggregateExpressions = newAggregateExpressions) +case s: Subquery if !stop && missingAttrs.nonEmpty => s +case o => --- End diff -- How about Filter (having clause)? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12791][SQL] Simplify CaseWhen by breaki...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10734#issuecomment-171217947 **[Test build #49308 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49308/consoleFull)** for PR 10734 at commit [`e0b21d4`](https://github.com/apache/spark/commit/e0b21d4539339ba1c581c6fc3e45d8282a0eba97). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12543] [SPARK-4226] [SQL] Subquery in e...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10706#issuecomment-171241464 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49306/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12685] [MLlib] [Backport to 1.4]word2ve...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/10721#issuecomment-171241765 @hhbyyh do you know if it cherry-picks cleanly into other branches? @jkbradley indicated it didn't. Back-porting to 1.6 makes sense; 1.5 maybe; 1.4 seems pretty old as it's very unlikely to see another release. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12543] [SPARK-4226] [SQL] Subquery in e...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10706#issuecomment-171241459 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12780] Inconsistency returning value of...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10724#issuecomment-171261769 **[Test build #49313 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49313/consoleFull)** for PR 10724 at commit [`becc49c`](https://github.com/apache/spark/commit/becc49c2fd137a77836b58730f98f986e2aea36d). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12780] Inconsistency returning value of...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10724#issuecomment-171261775 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49313/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12265][Mesos] Spark calls System.exit i...
Github user dragos commented on the pull request: https://github.com/apache/spark/pull/10729#issuecomment-171286695 Yes, I see it hanging. I ran this on a Mesos cluster with no roles defined. ``` bin/spark-shell --conf spark.mesos.role=mu --master mesos://192.168.99.100:5050 ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12780] Inconsistency returning value of...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10724#issuecomment-171265799 **[Test build #49314 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49314/consoleFull)** for PR 10724 at commit [`becc49c`](https://github.com/apache/spark/commit/becc49c2fd137a77836b58730f98f986e2aea36d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12265][Mesos] Spark calls System.exit i...
Github user dragos commented on a diff in the pull request: https://github.com/apache/spark/pull/10729#discussion_r49588313 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackend.scala --- @@ -376,6 +376,7 @@ private[spark] class MesosSchedulerBackend( inClassLoader() { logError("Mesos error: " + message) scheduler.error(message) --- End diff -- The problem is this call: it throws an exception, so `markErr` is never called. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12780] Inconsistency returning value of...
Github user yinxusen commented on the pull request: https://github.com/apache/spark/pull/10724#issuecomment-171262525 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Changes to support KMeans with large feature s...
GitHub user levin-royl opened a pull request: https://github.com/apache/spark/pull/10739 Changes to support KMeans with large feature space The problem: -- In Spark's KMeans code the center vectors are always represented as dense vectors. As a result, when each such center has a large domain space the algorithm quickly runs out of memory. In my example I have a feature space of around 5 and k ~= 500. This sums up to around 200MB RAM for the center vectors alone while in fact the center vectors are very sparse and require a lot less RAM. Since I am running on a system with relatively low resources I keep getting OutOfMemory errors. In my setting it is OK to trade off runtime for using less RAM. This is what I set out to do in my solution while allowing users the flexibility to choose. My solution: Allow the kmeans algorithm to accept a VectorFactory which decides when vectors used inside the algorithm should be sparse and when they should be dense. For backward compatibility the default behavior is to always make them dense (like the situation is now). But now potentially the user can provide a SmartVectorFactory (or some proprietary VectorFactory) which can decide to make vectors sparse. For this I made the following changes: (1) Added a method called reassign to SparseVectors allowing to change the indices and values (2) Allow axpy to accept SparseVectors (3) create a trait called VectorFactory and two implementations for it that are used within KMeans code You can merge this pull request into a Git repository by running: $ git pull https://github.com/levin-royl/spark SupportLargeFeatureDomains Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10739.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10739 commit 33d760c7d848da66d8a84523f11a7fc38ff1afc4 Author: Roy LevinDate: 2016-01-13T10:47:11Z Changes to support KMeans with large feature space --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12780] Inconsistency returning value of...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10724#issuecomment-171261203 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12780] Inconsistency returning value of...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10724#issuecomment-171261208 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49312/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12780] Inconsistency returning value of...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10724#issuecomment-171261470 **[Test build #49313 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49313/consoleFull)** for PR 10724 at commit [`becc49c`](https://github.com/apache/spark/commit/becc49c2fd137a77836b58730f98f986e2aea36d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12780] Inconsistency returning value of...
Github user yinxusen commented on the pull request: https://github.com/apache/spark/pull/10724#issuecomment-171267786 Something wrong with Jenkins? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12780] Inconsistency returning value of...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10724#issuecomment-171261771 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12728][SQL] Integrates SQL generation w...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/10733#issuecomment-171261693 @cloud-fan "CTE within view" is such a test case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12780] Inconsistency returning value of...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10724#issuecomment-171266455 **[Test build #49314 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49314/consoleFull)** for PR 10724 at commit [`becc49c`](https://github.com/apache/spark/commit/becc49c2fd137a77836b58730f98f986e2aea36d). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12780] Inconsistency returning value of...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10724#issuecomment-171266469 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49314/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12780] Inconsistency returning value of...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10724#issuecomment-171266465 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Changes to support KMeans with large feature s...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10739#issuecomment-171272165 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12265][Mesos] Spark calls System.exit i...
Github user nraychaudhuri commented on the pull request: https://github.com/apache/spark/pull/10729#issuecomment-171280201 @dragos do you see it hanging? I introduced markErr method to take care of the countdown latch. This is called from the error handler. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5493] [core] Add option to impersonate ...
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/4405#issuecomment-171383089 @hemshankar Please don't use github to ask questions / point out possible issues. See http://spark.apache.org/community.html. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org