[GitHub] spark issue #13487: [SPARK-15744][SQL] Rename two TungstenAggregation*Suites...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13487 Thank you, @rxin . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13487: [MINOR][SQL] Update testsuites/comments/error mes...
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/13487 [MINOR][SQL] Update testsuites/comments/error messages about Tungsten/SortBasedAggregate. ## What changes were proposed in this pull request? For consistency, this PR updates some remaining `TungstenAggregation/SortBasedAggregate` after SPARK-15728. - Update a comment in codegen in `VectorizedHashMapGenerator.scala`. - `TungstenAggregationQuerySuite` --> `HashAggregationQuerySuite` - `TungstenAggregationQueryWithControlledFallbackSuite` --> `HashAggregationQueryWithControlledFallbackSuite` - Update two error messages in `SQLQuerySuite.scala` and `AggregationQuerySuite.scala`. - Update several comments. ## How was this patch tested? Manual (Only comment changes and test suite renamings). You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark SPARK-15744 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13487.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13487 commit 345b1916d8a6dcfc05c2b4958aec71e21138e3e5 Author: Dongjoon Hyun <dongj...@apache.org> Date: 2016-06-03T00:21:55Z [MINOR][SQL] Update testsuites/comments/error messages about Tungsten/SortBasedAggregate. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13486: [SPARK-15743][SQL] Prevent saving with all-column...
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/13486 [SPARK-15743][SQL] Prevent saving with all-column partitioning ## What changes were proposed in this pull request? When saving datasets on storage, `partitionBy` provides an easy way to construct the directory structure. However, if a user choose all columns as partition columns, some exceptions occurs. - **ORC with all column partitioning**: `AnalysisException` on **future read** due to schema inference failure. ``` scala> spark.range(10).write.format("orc").mode("overwrite").partitionBy("id").save("/tmp/data") scala> spark.read.format("orc").load("/tmp/data").collect() org.apache.spark.sql.AnalysisException: Unable to infer schema for ORC at /tmp/data. It must be specified manually; ``` - **Parquet with all-column partitioning**: `InvalidSchemaException` on **write execution** due to Parquet limitation. ``` scala> spark.range(100).write.format("parquet").mode("overwrite").partitionBy("id").save("/tmp/data") [Stage 0:> (0 + 8) / 8]16/06/02 16:51:17 ERROR Utils: Aborting task org.apache.parquet.schema.InvalidSchemaException: A group type can not be empty. Parquet does not support empty group without leaves. Empty group: spark_schema ... (lots of error messages) ``` Although some formats like JSON support all-column partitioning without any problem, it seems not a good idea to make lots of empty directories. This PR prevents saving with all-column partitioning by consistently raising `AnalysisException` before saving. ## How was this patch tested? Newly added `PartitioningUtilsSuite`. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark SPARK-15743 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13486.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13486 commit bb97467dba96604d26d45763f4115152640ff189 Author: Dongjoon Hyun <dongj...@apache.org> Date: 2016-06-02T23:14:50Z [SPARK-15743][SQL] Prevent saving with all-column partitioning --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13403: [SPARK-15660][CORE] RDD and Dataset should show the cons...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13403 What about just adding an explicit note on old `StatCounter.stdev`? http://spark.apache.org/docs/2.0.0-preview/api/scala/index.html#org.apache.spark.util.StatCounter MLLIB `stat.Statistics` is also consistent with `Dataset`. ``` scala> import org.apache.spark.mllib.linalg.Vectors scala> import org.apache.spark.mllib.stat.{MultivariateStatisticalSummary, Statistics} scala> Statistics.colStats(sc.parallelize(Seq(Vectors.dense(1.0),Vectors.dense(2.0),Vectors.dense(3.0.variance res10: org.apache.spark.mllib.linalg.Vector = [1.0] ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13403: [SPARK-15660][CORE] RDD and Dataset should show the cons...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13403 Although we can not change old API, I think it's a good idea to add `popVariance` and `popStdev` clearly. If everything in this PR is now allowed, what about just adding an explicit note on old `StatCounter.variance` and `StatCounter.stdev`? http://spark.apache.org/docs/2.0.0-preview/api/scala/index.html#org.apache.spark.util.StatCounter --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13545: [SPARK-15807][SQL] Support varargs for dropDuplicates in...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13545 Hi, @rxin . I updated this PR and JIRA by removing `distinct`-related changes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13486: [SPARK-15743][SQL] Prevent saving with all-column...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13486#discussion_r66349438 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/PartitioningUtilsSuite.scala --- @@ -0,0 +1,36 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.datasources + +import org.apache.spark.sql.AnalysisException +import org.apache.spark.sql.test.SharedSQLContext + +class PartitioningUtilsSuite extends SharedSQLContext { --- End diff -- Sure. No problem. I'll put them into `DataFrameReaderWriterSuite`, too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13486: [SPARK-15743][SQL] Prevent saving with all-column...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13486#discussion_r66349371 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala --- @@ -339,7 +339,7 @@ private[sql] object PartitioningUtils { private val upCastingOrder: Seq[DataType] = Seq(NullType, IntegerType, LongType, FloatType, DoubleType, StringType) - def validatePartitionColumnDataTypes( + def validatePartitionColumnDataTypesAndCount( --- End diff -- Thank you for review, @marmbrus . That sounds better. I'll update that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13486: [SPARK-15743][SQL] Prevent saving with all-column partit...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13486 Hi, @marmbrus . Now, the PR is updated according to your advice and passed the Jenkins again. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13520: [SPARK-15773][CORE][EXAMPLE] Avoid creating local variab...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13520 Since this is about examples, I think the shorter is the better. Users can think simply `parallelize` or `broadcast` are just one of functions without knowing `SparkContext`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13520: [SPARK-15773][CORE][EXAMPLE] Avoid creating local variab...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13520 Initially, I thought the printed message was wrong in the statement `println("Creating SparkContext")` because `spark.sparkContext` is just to return the already existing one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13520: [SPARK-15773][CORE][EXAMPLE] Avoid creating local variab...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13520 Thank you, @srowen ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13520: [SPARK-15773][CORE][EXAMPLE] Avoid creating local variab...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13520 Thank you for review, @rxin and @srowen . The main rational of this PR is to make `SparkSession` explicitly as a starting point for the operations in these examples. (Instead of SparkContext, sc). `Spark` uses natually `'.'` to make a long sequence of operations, i.e, `sc.parallelize().map().reduce()` or `spark.createDataFrame().toDF().stat.crosstab().show()`. And, before `SparkSession`, the starting points were `SparkContext` and `Dataset/Dataframe/RDD`. This PR tried to treat `SparkSession` and `Dataset/Dataframe/RDD` as the starting points in these examples and didn't touch other examples which `sc` is repeated a lot. The other things like replacing `var` with `val` are irrelevant. I can revert them. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13545: [SPARK-15807][SQL] Support varargs for distinct/d...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13545#discussion_r66152341 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -2262,6 +2275,19 @@ class Dataset[T] private[sql]( def distinct(): Dataset[T] = dropDuplicates() /** + * Returns a new [[Dataset]] that contains only the unique rows from this [[Dataset]], considering + * only the subset of columns. This is an alias for `dropDuplicates(cols)`. + * + * Note that, equality checking is performed directly on the encoded representation of the data + * and thus is not affected by a custom `equals` function defined on `T`. + * + * @group typedrel + * @since 2.0.0 + */ + @scala.annotation.varargs + def distinct(cols: String*): Dataset[T] = dropDuplicates(cols) --- End diff -- Thank you always for fast feedbacks, @rxin . And for nice lunch. :) Yes, right. For this, maybe it's not needed because `distinct` is usually used with `select`. Also, we can use `dropDuplicates` since it's just an alias of `dropDuplicates`. I think `distinct` is a function name which is more consistent with SQL. If we have this, we can do this, too. ``` ds.select("_1", "_2", "_3").distinct("_1").orderBy("_1", "_2").show() ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13545: [SPARK-15807][SQL] Support varargs for distinct/dropDupl...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13545 What do you think `dropDuplicates`? 1. ds.select("_1", "_2", "_3").dropDuplicates(Seq("_1", "_2")).orderBy("_1", "_2").show() 2. ds.select("_1", "_2", "_3").dropDuplicates("_1", "_2").orderBy("_1", "_2").show() I think the second is more consistent with the others, `select` and `orderBy`. Do you dislike this one too? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13545: [SPARK-15807][SQL] Support varargs for distinct/d...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13545#discussion_r66156310 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -2262,6 +2275,19 @@ class Dataset[T] private[sql]( def distinct(): Dataset[T] = dropDuplicates() /** + * Returns a new [[Dataset]] that contains only the unique rows from this [[Dataset]], considering + * only the subset of columns. This is an alias for `dropDuplicates(cols)`. + * + * Note that, equality checking is performed directly on the encoded representation of the data + * and thus is not affected by a custom `equals` function defined on `T`. + * + * @group typedrel + * @since 2.0.0 + */ + @scala.annotation.varargs + def distinct(cols: String*): Dataset[T] = dropDuplicates(cols) --- End diff -- In addition, `distinct` of `dplyr` R packages works in the same manner. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13486: [SPARK-15743][SQL] Prevent saving with all-column partit...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13486 Hi, @marmbrus . Could you review this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13634: [SPARK-15913][CORE] Dispatcher.stopped should be ...
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/13634 [SPARK-15913][CORE] Dispatcher.stopped should be enclosed by synchronized block. ## What changes were proposed in this pull request? `Dispatcher.stopped` is guarded by `this`, but it is used without synchronization in `postMessage` function. This PR fixes this and also the exception message became more accurate. ## How was this patch tested? Pass the existing Jenkins tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark SPARK-15913 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13634.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13634 commit 75a5254371374faf66f166e1b2683d3f9803cb8e Author: Dongjoon Hyun <dongj...@apache.org> Date: 2016-06-13T05:53:47Z [SPARK-15913][CORE] Dispatcher.stopped should be enclosed by synchronized block. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13634: [SPARK-15913][CORE] Dispatcher.stopped should be enclose...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13634 Hi, @vanzin . Could you review this when you have some time? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13436: [SPARK-15696][SQL] Improve `crosstab` to have a consiste...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13436 Thank you, @rxin . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13486: [SPARK-15743][SQL] Prevent saving with all-column partit...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13486 Hi, @marmbrus . Could you review this PR again? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13608: [SPARK-15883][MLLIB][DOCS] Fix broken links in mllib doc...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13608 Actually, at this time, I manually clicked all the link in mllib documentation. Maybe, later, we can make some simple crawler to check this kind of errors. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13486: [SPARK-15743][SQL] Prevent saving with all-column partit...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13486 Thank you, @marmbrus ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13608: [SPARK-15883][MLLIB][DOCS] Fix broken links in ml...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13608#discussion_r66682287 --- Diff: docs/mllib-data-types.md --- @@ -535,12 +537,6 @@ rowsRDD = mat.rows # Convert to a RowMatrix by dropping the row indices. rowMat = mat.toRowMatrix() - -# Convert to a CoordinateMatrix. -coordinateMat = mat.toCoordinateMatrix() - -# Convert to a BlockMatrix. -blockMat = mat.toBlockMatrix() --- End diff -- This is a redundant and inconsistent code existing only `Python` part in this section. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13608: [SPARK-15883][MLLIB][DOCS] Fix broken links in ml...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13608#discussion_r66682877 --- Diff: docs/mllib-linear-methods.md --- @@ -185,10 +185,10 @@ algorithm for 200 iterations. import org.apache.spark.mllib.optimization.L1Updater val svmAlg = new SVMWithSGD() -svmAlg.optimizer. - setNumIterations(200). - setRegParam(0.1). - setUpdater(new L1Updater) +svmAlg.optimizer --- End diff -- I changed the trailing dot ('.'). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13608: [SPARK-15883][MLLIB][DOCS] Fix broken links in mllib doc...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13608 Yep. I already built this with Jekyll locally and checked the result manually, too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13608: [SPARK-15883][MLLIB][DOCS] Fix broken links in mllib doc...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13608 Thank you for fast review, @srowen . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13608: [SPARK-15883][MLLIB][DOCS] Fix broken links in ml...
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/13608 [SPARK-15883][MLLIB][DOCS] Fix broken links in mllib documents ## What changes were proposed in this pull request? This issue fixes all broken links on Spark 2.0 preview MLLib documents. Also, this contains some editorial change. **Fix broken links** * mllib-data-types.md * mllib-decision-tree.md * mllib-ensembles.md * mllib-feature-extraction.md * mllib-pmml-model-export.md * mllib-statistics.md **Fix malformed section header and scala coding style** * mllib-linear-methods.md **Replace indirect forward links with direct one** * ml-classification-regression.md ## How was this patch tested? Manual tests (with `cd docs; jekyll build`.) You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark SPARK-15883 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13608.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13608 commit 3e4cdc14a386e3a1d8e301995450db255b32486a Author: Dongjoon Hyun <dongj...@apache.org> Date: 2016-06-10T21:11:31Z [SPARK-15883][MLLIB][DOCS] Fix broken links in mllib documents --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13520: [SPARK-15773][CORE][EXAMPLE] Avoid creating local variab...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13520 Thank you, @rxin ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13545: [SPARK-15807][SQL] Support varargs for dropDuplicates in...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13545 Thank you again, @rxin . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13608: [SPARK-15883][MLLIB][DOCS] Fix broken links in ml...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13608#discussion_r66687221 --- Diff: docs/mllib-linear-methods.md --- @@ -395,7 +395,7 @@ section of the Spark quick-start guide. Be sure to also include *spark-mllib* to your build file as a dependency. -###Streaming linear regression --- End diff -- Currently, this does not make a section. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13545: [SPARK-15807][SQL] Support varargs for dropDuplicates in...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13545 Hi, @rxin . For `dropDuplicates`, this PR definitely adds a new signature. However, I think this is the right direction to improve user experience because they expect the same usage pattern for `dropDuplicates`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13608: [SPARK-15883][MLLIB][DOCS] Fix broken links in ml...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13608#discussion_r66703204 --- Diff: docs/mllib-linear-methods.md --- @@ -185,10 +185,10 @@ algorithm for 200 iterations. import org.apache.spark.mllib.optimization.L1Updater val svmAlg = new SVMWithSGD() -svmAlg.optimizer. - setNumIterations(200). - setRegParam(0.1). - setUpdater(new L1Updater) +svmAlg.optimizer --- End diff -- Thanks. :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13436: [SPARK-15696][SQL] Improve `crosstab` to have a consiste...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13436 Hi, @rxin . Could you review this PR and give some opinion when you have some time? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13486: [SPARK-15743][SQL] Prevent saving with all-column...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13486#discussion_r65799702 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala --- @@ -350,6 +350,10 @@ private[sql] object PartitioningUtils { case _ => throw new AnalysisException(s"Cannot use ${field.dataType} for partition column") } } + +if (partitionColumns.size == schema.fields.size) { + throw new AnalysisException(s"Cannot use all columns for partition columns") +} } --- End diff -- Then, let's change it. :) Since `PartitionUtils` is `private[sql]`, it's safe to be changed. I'll update this PR. Thank you for your review and idea! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13486: [SPARK-15743][SQL] Prevent saving with all-column partit...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13486 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59987/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13486: [SPARK-15743][SQL] Prevent saving with all-column...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13486#discussion_r65799585 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala --- @@ -350,6 +350,10 @@ private[sql] object PartitioningUtils { case _ => throw new AnalysisException(s"Cannot use ${field.dataType} for partition column") } } + +if (partitionColumns.size == schema.fields.size) { + throw new AnalysisException(s"Cannot use all columns for partition columns") +} } --- End diff -- Thank you for attention, @wangyang1992 . Good point! Maybe, `validatePartitionColumnDataTypes` -> `validatePartitionColumnDataTypesAndCount` ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13486: [SPARK-15743][SQL] Prevent saving with all-column partit...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13486 **[Test build #59986 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59986/consoleFull)** for PR 13486 at commit [`9c5f13d`](https://github.com/apache/spark/commit/9c5f13d6e7c020fb7d983e607116683e4b007f05). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13486: [SPARK-15743][SQL] Prevent saving with all-column partit...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13486 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59986/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13486: [SPARK-15743][SQL] Prevent saving with all-column partit...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13486 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13486: [SPARK-15743][SQL] Prevent saving with all-column partit...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13486 **[Test build #59987 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59987/consoleFull)** for PR 13486 at commit [`6a9006d`](https://github.com/apache/spark/commit/6a9006d25a1566b4e17021bff7405992f872e6c6). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13486: [SPARK-15743][SQL] Prevent saving with all-column partit...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13486 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MINOR][CORE] Fix a HadoopRDD log message and ...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/13294#issuecomment-221656682 Thank you, @andrewor14 and @srowen ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15512][CORE] repartition(0) should rais...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/13282#issuecomment-221426314 Yes. They need this. I'll add that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13545: [SPARK-15807][SQL] Support varargs for dropDuplicates in...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13545 Thank you for merging, @rxin ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13634: [SPARK-15913][CORE] Dispatcher.stopped should be enclose...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13634 Thank you for review, @srowen . Oh, right. That sounds much better to me. I'll update this PR like that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13634: [SPARK-15913][CORE] Dispatcher.stopped should be ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13634#discussion_r66761760 --- Diff: core/src/main/scala/org/apache/spark/rpc/netty/Dispatcher.scala --- @@ -144,24 +144,21 @@ private[netty] class Dispatcher(nettyEnv: NettyRpcEnv) extends Logging { endpointName: String, message: InboxMessage, callbackIfStopped: (Exception) => Unit): Unit = { -val shouldCallOnStop = synchronized { +val error: Option[Exception] = synchronized { val data = endpoints.get(endpointName) - if (stopped || data == null) { -true + if (stopped) { +Some(new RpcEnvStoppedException()) + } else if (data == null) { +Some(new SparkException(s"Could not find $endpointName.")) } else { data.inbox.post(message) receivers.offer(data) -false +None } } -if (shouldCallOnStop) { +if (error.isDefined) { --- End diff -- Thank you again. I'll change both, too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13634: [SPARK-15913][CORE] Dispatcher.stopped should be enclose...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13634 Thank you always, @srowen . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13634: [SPARK-15913][CORE] Dispatcher.stopped should be enclose...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13634 Thank you, @vanzin ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13643: [SPARK-15922][MLLIB] `toIndexedRowMatrix` should ...
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/13643 [SPARK-15922][MLLIB] `toIndexedRowMatrix` should consider the case `cols < colsPerBlock` ## What changes were proposed in this pull request? SPARK-15922 reports the following scenario throwing an exception due to the mismatched vector sizes. This PR handles the exceptional case, `cols < colsPerBlock`. **Before** ```scala scala> import org.apache.spark.mllib.linalg.distributed._ scala> import org.apache.spark.mllib.linalg._ scala> val rows = IndexedRow(0L, new DenseVector(Array(1,2,3))) :: IndexedRow(1L, new DenseVector(Array(1,2,3))):: IndexedRow(2L, new DenseVector(Array(1,2,3))):: Nil scala> val rdd = sc.parallelize(rows) scala> val matrix = new IndexedRowMatrix(rdd, 3, 3) scala> val bmat = matrix.toBlockMatrix scala> val imat = bmat.toIndexedRowMatrix scala> imat.rows.collect ... throw exception ``` **After** ```scala ... scala> imat.rows.collect res0: Array[org.apache.spark.mllib.linalg.distributed.IndexedRow] = Array(IndexedRow(0,[1.0,2.0,3.0]), IndexedRow(1,[1.0,2.0,3.0]), IndexedRow(2,[1.0,2.0,3.0])) ``` ## How was this patch tested? Pass the Jenkins tests (including the above case) You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark SPARK-15922 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13643.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13643 commit 85054becae5eb0075620bd674d534ea27a9268b5 Author: Dongjoon Hyun <dongj...@apache.org> Date: 2016-06-13T18:28:05Z [SPARK-15922][MLLIB] `toIndexedRowMatrix` should consider the case `cols < colsPerBlock` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13684: [SPARK-15908][R] Add varargs-type dropDuplicates(...
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/13684 [SPARK-15908][R] Add varargs-type dropDuplicates() function in SparkR ## What changes were proposed in this pull request? This PR adds varargs-type `dropDuplicates` function to SparkR for API parity. Refer to https://issues.apache.org/jira/browse/SPARK-15807, too. ## How was this patch tested? Pass the Jenkins tests with new testcases. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark SPARK-15908 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13684.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13684 commit f1d6355af9dc8e782680a1fc3fac07f8ca31b82b Author: Dongjoon Hyun <dongj...@apache.org> Date: 2016-06-15T10:08:28Z [SPARK-15908][R] Add varargs-type dropDuplicates() function in SparkR --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13636: [SPARK-15637][SPARK-15931][SPARKR] Fix R masked function...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13636 This passes for me, too. Thank you, @felixcheung . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13592#discussion_r67202830 --- Diff: docs/sql-programming-guide.md --- @@ -889,7 +887,7 @@ df.select("name", "favorite_color").write.save("namesAndFavColors.parquet") {% highlight r %} -df <- read.df(sqlContext, "examples/src/main/resources/users.parquet") +df <- read.df(spark, "examples/src/main/resources/users.parquet") --- End diff -- ``` df <- read.df("examples/src/main/resources/users.parquet") ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13592#discussion_r67202934 --- Diff: docs/sql-programming-guide.md --- @@ -939,7 +937,7 @@ df.select("name", "age").write.save("namesAndAges.parquet", format="parquet") {% highlight r %} -df <- read.df(sqlContext, "examples/src/main/resources/people.json", "json") +df <- read.df(spark, "examples/src/main/resources/people.json", "json") --- End diff -- ``` df <- read.df("examples/src/main/resources/people.json", "json") ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13592#discussion_r67203021 --- Diff: docs/sql-programming-guide.md --- @@ -956,30 +954,30 @@ file directly with SQL. {% highlight scala %} -val df = sqlContext.sql("SELECT * FROM parquet.`examples/src/main/resources/users.parquet`") +val df = spark.sql("SELECT * FROM parquet.`examples/src/main/resources/users.parquet`") {% endhighlight %} {% highlight java %} -DataFrame df = sqlContext.sql("SELECT * FROM parquet.`examples/src/main/resources/users.parquet`"); +Dataset df = spark.sql("SELECT * FROM parquet.`examples/src/main/resources/users.parquet`"); {% endhighlight %} {% highlight python %} -df = sqlContext.sql("SELECT * FROM parquet.`examples/src/main/resources/users.parquet`") +df = spark.sql("SELECT * FROM parquet.`examples/src/main/resources/users.parquet`") {% endhighlight %} {% highlight r %} -df <- sql(sqlContext, "SELECT * FROM parquet.`examples/src/main/resources/users.parquet`") +df <- sql(spark, "SELECT * FROM parquet.`examples/src/main/resources/users.parquet`") --- End diff -- The same. ``` df <- sql("SELECT * FROM parquet.`examples/src/main/resources/users.parquet`") ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13592#discussion_r67203777 --- Diff: docs/sql-programming-guide.md --- @@ -1142,11 +1141,11 @@ write.parquet(schemaPeople, "people.parquet") # Read in the Parquet file created above. Parquet files are self-describing so the schema is preserved. # The result of loading a parquet file is also a DataFrame. -parquetFile <- read.parquet(sqlContext, "people.parquet") +parquetFile <- read.parquet(spark, "people.parquet") # Parquet files can also be used to create a temporary view and then used in SQL statements. registerTempTable(parquetFile, "parquetFile") --- End diff -- ``` createOrReplaceTempView(parquetFile, "parquetFile") ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13592#discussion_r67204146 --- Diff: docs/sql-programming-guide.md --- @@ -1142,11 +1141,11 @@ write.parquet(schemaPeople, "people.parquet") # Read in the Parquet file created above. Parquet files are self-describing so the schema is preserved. # The result of loading a parquet file is also a DataFrame. -parquetFile <- read.parquet(sqlContext, "people.parquet") +parquetFile <- read.parquet(spark, "people.parquet") # Parquet files can also be used to create a temporary view and then used in SQL statements. registerTempTable(parquetFile, "parquetFile") -teenagers <- sql(sqlContext, "SELECT name FROM parquetFile WHERE age >= 13 AND age <= 19") +teenagers <- sql(spark, "SELECT name FROM parquetFile WHERE age >= 13 AND age <= 19") --- End diff -- ``` teenagers <- sql("SELECT name FROM parquetFile WHERE age >= 13 AND age <= 19") ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13592#discussion_r67202307 --- Diff: docs/sql-programming-guide.md --- @@ -171,9 +171,9 @@ df.show() {% highlight r %} -sqlContext <- SQLContext(sc) +spark <- SparkSession(sc) -df <- read.json(sqlContext, "examples/src/main/resources/people.json") +df <- read.json(spark, "examples/src/main/resources/people.json") --- End diff -- In `SparkR`, the above is deprecated. We can use now like the following. ``` df <- read.json("examples/src/main/resources/people.json") ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13592#discussion_r67202506 --- Diff: docs/sql-programming-guide.md --- @@ -363,10 +363,10 @@ In addition to simple column references and expressions, DataFrames also have a {% highlight r %} -sqlContext <- sparkRSQL.init(sc) +spark <- sparkRSQL.init(sc) # Create the DataFrame -df <- read.json(sqlContext, "examples/src/main/resources/people.json") +df <- read.json(spark, "examples/src/main/resources/people.json") --- End diff -- We can remove the following. ``` spark <- sparkRSQL.init(sc) ``` And, use the following. ``` df <- read.json("examples/src/main/resources/people.json") ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13592#discussion_r67202611 --- Diff: docs/sql-programming-guide.md --- @@ -419,35 +419,35 @@ In addition to simple column references and expressions, DataFrames also have a ## Running SQL Queries Programmatically -The `sql` function on a `SQLContext` enables applications to run SQL queries programmatically and returns the result as a `DataFrame`. +The `sql` function on a `SparkSession` enables applications to run SQL queries programmatically and returns the result as a `DataFrame`. {% highlight scala %} -val sqlContext = ... // An existing SQLContext -val df = sqlContext.sql("SELECT * FROM table") +val spark = ... // An existing SparkSession +val df = spark.sql("SELECT * FROM table") {% endhighlight %} {% highlight java %} -SQLContext sqlContext = ... // An existing SQLContext -DataFrame df = sqlContext.sql("SELECT * FROM table") +SparkSession spark = ... // An existing SparkSession +Dataset df = spark.sql("SELECT * FROM table") {% endhighlight %} {% highlight python %} -from pyspark.sql import SQLContext -sqlContext = SQLContext(sc) -df = sqlContext.sql("SELECT * FROM table") +from pyspark.sql import SparkSession +spark = SparkSession(sc) +df = spark.sql("SELECT * FROM table") {% endhighlight %} {% highlight r %} -sqlContext <- sparkRSQL.init(sc) -df <- sql(sqlContext, "SELECT * FROM table") +spark <- sparkRSQL.init(sc) +df <- sql(spark, "SELECT * FROM table") --- End diff -- Here, too. Remove `spark <- sparkRSQL.init(sc)` and use ``` df <- sql("SELECT * FROM table") ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13592#discussion_r67203431 --- Diff: docs/sql-programming-guide.md --- @@ -1142,11 +1141,11 @@ write.parquet(schemaPeople, "people.parquet") # Read in the Parquet file created above. Parquet files are self-describing so the schema is preserved. # The result of loading a parquet file is also a DataFrame. -parquetFile <- read.parquet(sqlContext, "people.parquet") +parquetFile <- read.parquet(spark, "people.parquet") --- End diff -- ``` parquetFile <- read.parquet("people.parquet") ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13592#discussion_r67204480 --- Diff: docs/sql-programming-guide.md --- @@ -1326,7 +1325,7 @@ write.df(df1, "data/test_table/key=1", "parquet", "overwrite") write.df(df2, "data/test_table/key=2", "parquet", "overwrite") # Read the partitioned table -df3 <- read.df(sqlContext, "data/test_table", "parquet", mergeSchema="true") +df3 <- read.df(spark, "data/test_table", "parquet", mergeSchema="true") --- End diff -- ``` df3 <- read.df("data/test_table", "parquet", mergeSchema="true") ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13684: [SPARK-15908][R] Add varargs-type dropDuplicates() funct...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13684 Hi, @shivaram . Could you review this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13684: [SPARK-15908][R] Add varargs-type dropDuplicates(...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13684#discussion_r67206540 --- Diff: R/pkg/R/DataFrame.R --- @@ -1859,7 +1859,7 @@ setMethod("where", #' @param colnames A character vector of column names. --- End diff -- Oh, thank you for review, @shivaram . Sure. I'll update the doc. Maybe, something like the following? ``` - #' @param colnames A character vector of column names. + #' @param col A character vector of column names or a string of column name + #' @param ... Additional column names ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13684: [SPARK-15908][R] Add varargs-type dropDuplicates(...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13684#discussion_r67209107 --- Diff: R/pkg/R/DataFrame.R --- @@ -1859,7 +1859,7 @@ setMethod("where", #' @param colnames A character vector of column names. --- End diff -- Yep. Right. I will add that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13684: [SPARK-15908][R] Add varargs-type dropDuplicates(...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13684#discussion_r67210755 --- Diff: R/pkg/R/DataFrame.R --- @@ -1869,6 +1869,7 @@ setMethod("where", #' path <- "path/to/file.json" #' df <- read.json(path) #' dropDuplicates(df) +#' dropDuplicates(df, "col1", "col2") #' dropDuplicates(df, c("col1", "col2")) #' } setMethod("dropDuplicates", --- End diff -- Actually, I kept the existing `dropDuplicates` since it handles `dropDuplicates(df)` for all columns, too. Don't we still need two functions if we move the case "c(...)"? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13643: [SPARK-15922][MLLIB] `toIndexedRowMatrix` should conside...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13643 Thank you again, @srowen . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #13643: [SPARK-15922][MLLIB] `toIndexedRowMatrix` should conside...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/13643 Hi, @Fokko and @mengxr . Could you review this PR when you have some time? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13643: [SPARK-15922][MLLIB] `toIndexedRowMatrix` should ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13643#discussion_r66849603 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala --- @@ -288,7 +288,7 @@ class BlockMatrix @Since("1.3.0") ( vectors.foreach { case (blockColIdx: Int, vec: BV[Double]) => val offset = colsPerBlock * blockColIdx -wholeVector(offset until offset + colsPerBlock) := vec +wholeVector(offset until offset + Math.min(cols, colsPerBlock)) := vec --- End diff -- Oh, thank you. Mathematically, yours is correct. I'll fix this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13520: [SPARK-15773][CORE][EXAMPLE] Avoid creating local...
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/13520 [SPARK-15773][CORE][EXAMPLE] Avoid creating local variable `sc` in examples if possible ## What changes were proposed in this pull request? Instead of using local variable `sc` like the following example, this PR uses `spark.sparkContext`. This makes examples more concise, and also fixes some misleading, i.e., creating SparkContext from SparkSession. ``` -println("Creating SparkContext") -val sc = spark.sparkContext - println("Writing local file to DFS") val dfsFilename = dfsDirPath + "/dfs_read_write_test" -val fileRDD = sc.parallelize(fileContents) +val fileRDD = spark.sparkContext.parallelize(fileContents) ``` This will change 12 files (+30 lines, -52 lines). ## How was this patch tested? Manual. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark SPARK-15773 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13520.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13520 commit 0a5d82fc8c1b3e0910231060090181e143e5215a Author: Dongjoon Hyun <dongj...@apache.org> Date: 2016-06-05T21:42:42Z [SPARK-15773][CORE][EXAMPLE] Avoid creating local variable `sc` in examples if possible --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13545: [SPARK-15807][SQL] Support varargs for distinct/d...
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/13545 [SPARK-15807][SQL] Support varargs for distinct/dropDuplicates in Dataset/DataFrame ## What changes were proposed in this pull request? This PR adds `varargs`-types `distinct/dropDuplicates` functions in `Dataset/DataFrame`. Currently, `distinct` does not get arguments, and `dropDuplicates` supports only `Seq` or `Array`. **Before** ```scala scala> val ds = spark.createDataFrame(Seq(("a", 1), ("b", 2), ("a", 2))) ds: org.apache.spark.sql.DataFrame = [_1: string, _2: int] scala> ds.dropDuplicates(Seq("_1", "_2")) res0: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [_1: string, _2: int] scala> ds.dropDuplicates("_1", "_2") :26: error: overloaded method value dropDuplicates with alternatives: (colNames: Array[String])org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] (colNames: Seq[String])org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] ()org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] cannot be applied to (String, String) ds.dropDuplicates("_1", "_2") ^ scala> ds.distinct("_1", "_2") :26: error: too many arguments for method distinct: ()org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] ds.distinct("_1", "_2") ``` **After** ```scala scala> val ds = spark.createDataFrame(Seq(("a", 1), ("b", 2), ("a", 2))) ds: org.apache.spark.sql.DataFrame = [_1: string, _2: int] scala> ds.dropDuplicates("_1", "_2") res0: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [_1: string, _2: int] scala> ds.distinct("_1", "_2") res1: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [_1: string, _2: int] ``` ## How was this patch tested? Pass the Jenkins tests with new testcases. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark SPARK-15807 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13545.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13545 commit 33f446f4bb04e2ea0014c385b6f0d1b290db5a90 Author: Dongjoon Hyun <dongj...@apache.org> Date: 2016-06-07T18:34:24Z [SPARK-15807][SQL] Support varargs for distinct/dropDuplicates --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15644] [MLlib] [SQL] Replace SQLContext...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/13380#issuecomment-222337348 Hi, @gatorsmile . Personally, I love this PR. :) I just hesitated to change the function signatures of MLLIB in #13352 . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15644] [MLlib] [SQL] Replace SQLContext...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13380#discussion_r64996971 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/joins/BroadcastJoinSuite.scala --- @@ -48,7 +48,7 @@ class BroadcastJoinSuite extends QueryTest with SQLTestUtils { .setMaster("local-cluster[2,1,1024]") .setAppName("testing") val sc = new SparkContext(conf) -spark = SparkSession.builder.getOrCreate() --- End diff -- In your PR, only this line is related. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15644] [MLlib] [SQL] Replace SQLContext...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13380#discussion_r64996960 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/joins/BroadcastJoinSuite.scala --- @@ -48,7 +48,7 @@ class BroadcastJoinSuite extends QueryTest with SQLTestUtils { .setMaster("local-cluster[2,1,1024]") .setAppName("testing") val sc = new SparkContext(conf) -spark = SparkSession.builder.getOrCreate() --- End diff -- FYI, after #13352 , I proceeded to #13365 ([SPARK-15618][SQL][MLLIB] Use SparkSession.builder.sparkContext if applicable.) You can fix the above line like the following. ``` - spark = SparkSession.builder().config(sc.getConf).getOrCreate() + spark = SparkSession.builder().sparkContext(sc).getOrCreate() ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15644] [MLlib] [SQL] Replace SQLContext...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13380#discussion_r64997097 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/joins/BroadcastJoinSuite.scala --- @@ -48,7 +48,7 @@ class BroadcastJoinSuite extends QueryTest with SQLTestUtils { .setMaster("local-cluster[2,1,1024]") .setAppName("testing") val sc = new SparkContext(conf) -spark = SparkSession.builder.getOrCreate() --- End diff -- Oh, I checked my PR again and found that I already fix this in my PR. Yes, right. You had better revert this line to avoid unnecessary conflicts. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15644] [MLlib] [SQL] Replace SQLContext...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13380#discussion_r64997103 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/joins/BroadcastJoinSuite.scala --- @@ -48,7 +48,7 @@ class BroadcastJoinSuite extends QueryTest with SQLTestUtils { .setMaster("local-cluster[2,1,1024]") .setAppName("testing") val sc = new SparkContext(conf) -spark = SparkSession.builder.getOrCreate() --- End diff -- Here. https://github.com/apache/spark/pull/13365/files#diff-d8244612a613500ec2c52e9ef0538376R47 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15647] [SQL] Fix Boundary Cases in Opti...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/13392#issuecomment-222346595 Thank you for making me up-to-date, @gatorsmile ! By the way, there is one correction. My PR is about **parameterizing** the following previous code. :) ``` def shouldCodegen: Boolean = branches.length < CaseWhen.MAX_NUM_CASES_FOR_CODEGEN ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15557][SQL] expressi[on ((cast(99 as de...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13368#discussion_r64979724 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -290,11 +290,6 @@ object TypeCoercion { // Skip nodes who's children have not been resolved yet. case e if !e.childrenResolved => e - case a @ BinaryArithmetic(left @ StringType(), right @ DecimalType.Expression(_, _)) => -a.makeCopy(Array(Cast(left, DecimalType.SYSTEM_DEFAULT), right)) - case a @ BinaryArithmetic(left @ DecimalType.Expression(_, _), right @ StringType()) => -a.makeCopy(Array(left, Cast(right, DecimalType.SYSTEM_DEFAULT))) - --- End diff -- Hi, @dilipbiswal . IMHO, the root cause seems to be **decimal multiplication** between `decimal(38,18)`s. ``` scala> sql("select cast(10 as decimal(38,18)) * cast(10 as decimal(38,18))").head res0: org.apache.spark.sql.Row = [null] ``` How do you think about this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MINOR][CORE][DOCS] Fix description of FilterF...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/13404#issuecomment-222611700 Thank you, @rxin ! Then, I'll close this PR now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MINOR][CORE][DOCS] Fix description of FilterF...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13404#discussion_r65125923 --- Diff: core/src/main/java/org/apache/spark/api/java/function/package.scala --- @@ -22,4 +22,5 @@ package org.apache.spark.api.java * these interfaces to pass functions to various Java API methods for Spark. Please visit Spark's * Java programming guide for more details. */ -package object function --- End diff -- This is just removing one ending space and adding one blank line. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MINOR][CORE][DOCS] Fix description of FilterF...
Github user dongjoon-hyun closed the pull request at: https://github.com/apache/spark/pull/13404 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MINOR][CORE][DOC Fix description of FilterFun...
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/13404 [MINOR][CORE][DOC Fix description of FilterFunction ## What changes were proposed in this pull request? This PR fixes the wrong description of `FilterFunction`. ``` - * If the function returns true, the element is discarded in the returned Dataset. + * If the function returns true, the element is included in the returned Dataset. ``` ## How was this patch tested? You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark minor_fix_java_api Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13404.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13404 commit 94f666a54c4865ec2d915ae1a7250506aa836faf Author: Dongjoon Hyun <dongj...@apache.org> Date: 2016-05-31T05:31:39Z [MINOR][CORE] Fix description of FilterFunction --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15076][SQL] Improve ConstantFolding opt...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/12850#discussion_r65127553 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -751,6 +751,16 @@ object ConstantFolding extends Rule[LogicalPlan] { // Fold expressions that are foldable. case e if e.foldable => Literal.create(e.eval(EmptyRow), e.dataType) + + // Use associative property for integral type + case e if e.isInstanceOf[BinaryArithmetic] && e.dataType.isInstanceOf[IntegralType] +=> e match { +case Add(Add(a, b), c) if b.foldable && c.foldable => Add(a, Add(b, c)) --- End diff -- Thank you for review, @cloud-fan ! I see. That sounds great. Let me think about how to eliminate all constants then. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15660][CORE] RDD and Dataset should sho...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/13403#issuecomment-222614982 Thank you for review again @rxin. Actually, I fully understand and expect your decision. The reason why I making this issue is I think we need explicit discussions and the conclusion for this issue. I worried that Spark shows this inconsistency forever implicitly. As we know, if we do not this in Spark 2.0, this will happen on Spark 3.0 or maybe never because of the same reason. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15076][SQL] Improve ConstantFolding optimizer by ...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/12850#discussion_r65237290 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -742,6 +742,23 @@ object InferFiltersFromConstraints extends Rule[LogicalPlan] with PredicateHelpe * equivalent [[Literal]] values. */ object ConstantFolding extends Rule[LogicalPlan] { + private def isAssociativelyFoldable(e: Expression): Boolean = --- End diff -- Oh, that could be. There is some difference on level of granulity. Join-related optimizers might be improved later to cost-based optimizers while ConstantFolder optimizer is just about removing constants on a single expression. Do you think it is a good idea to put the different levels of concerns together? I can do this in any way you decide. :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15662][SQL] Add since annotation for classes in s...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13406#discussion_r65240101 --- Diff: core/src/main/java/org/apache/spark/api/java/function/package.scala --- @@ -22,4 +22,4 @@ package org.apache.spark.api.java * these interfaces to pass functions to various Java API methods for Spark. Please visit Spark's * Java programming guide for more details. */ -package object function +package object function --- End diff -- Could you take a look my PR again? Or, https://git-wip-us.apache.org/repos/asf?p=spark.git;a=blob;f=core/src/main/java/org/apache/spark/api/java/function/package.scala;h=0f9bac716416264aeba175b90c0b32570bc6dd81;hb=HEAD --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15662][SQL] Add since annotation for cl...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13406#discussion_r65136736 --- Diff: core/src/main/java/org/apache/spark/api/java/function/package.scala --- @@ -22,4 +22,4 @@ package org.apache.spark.api.java * these interfaces to pass functions to various Java API methods for Spark. Please visit Spark's * Java programming guide for more details. */ -package object function +package object function --- End diff -- Could you add one last blank line, too? IntelliJ shows one blank line, but it does not exists in Git repository. So, I added one in my PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15662][SQL] Add since annotation for cl...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13406#discussion_r65136837 --- Diff: core/src/main/java/org/apache/spark/api/java/function/package.scala --- @@ -22,4 +22,4 @@ package org.apache.spark.api.java * these interfaces to pass functions to various Java API methods for Spark. Please visit Spark's * Java programming guide for more details. */ -package object function +package object function --- End diff -- I don't know why IntelliJ show blank line there, but I used VIM to fix this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MINOR][SQL][DOCS] Fix docs of Dataset.scala and SQLImpl...
GitHub user dongjoon-hyun opened a pull request: https://github.com/apache/spark/pull/13420 [MINOR][SQL][DOCS] Fix docs of Dataset.scala and SQLImplicits.scala. ## What changes were proposed in this pull request? This PR fixes a sample code, a description, and indentations in docs. ## How was this patch tested? Manual. You can merge this pull request into a Git repository by running: $ git pull https://github.com/dongjoon-hyun/spark minor_fix_dataset_doc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13420.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13420 commit d208bc757f1dc9ee5b29fbbf4675aae82f689185 Author: Dongjoon Hyun <dongj...@apache.org> Date: 2016-05-31T19:32:49Z Fix docs of Dataset and SQLImplicits. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15662][SQL] Add since annotation for classes in s...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13406#discussion_r65249012 --- Diff: core/src/main/java/org/apache/spark/api/java/function/package.scala --- @@ -22,4 +22,4 @@ package org.apache.spark.api.java * these interfaces to pass functions to various Java API methods for Spark. Please visit Spark's * Java programming guide for more details. */ -package object function +package object function --- End diff -- What I meat is we need a line blank line at line number 26. :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15618][SQL][MLLIB] Use SparkSession.builder.spark...
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/13365 Hi, @andrewor14 . Could you review this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15678][SQL] Drop cache on appends and overwrites
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13419#discussion_r65251560 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetQuerySuite.scala --- @@ -67,6 +67,28 @@ class ParquetQuerySuite extends QueryTest with ParquetTest with SharedSQLContext TableIdentifier("tmp"), ignoreIfNotExists = true) } + test("drop cache on overwrite") { +withTempDir { dir => + val path = dir.toString + spark.range(1000).write.mode("overwrite").parquet(path) + val df = sqlContext.read.parquet(path).cache() + assert(df.count() == 1000) + sqlContext.range(10).write.mode("overwrite").parquet(path) --- End diff -- sqlContext -> spark --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15678][SQL] Drop cache on appends and overwrites
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13419#discussion_r65251574 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetQuerySuite.scala --- @@ -67,6 +67,28 @@ class ParquetQuerySuite extends QueryTest with ParquetTest with SharedSQLContext TableIdentifier("tmp"), ignoreIfNotExists = true) } + test("drop cache on overwrite") { +withTempDir { dir => + val path = dir.toString + spark.range(1000).write.mode("overwrite").parquet(path) + val df = sqlContext.read.parquet(path).cache() + assert(df.count() == 1000) + sqlContext.range(10).write.mode("overwrite").parquet(path) + assert(sqlContext.read.parquet(path).count() == 10) +} + } + + test("drop cache on append") { +withTempDir { dir => + val path = dir.toString + spark.range(1000).write.mode("append").parquet(path) + val df = sqlContext.read.parquet(path).cache() + assert(df.count() == 1000) + sqlContext.range(10).write.mode("append").parquet(path) --- End diff -- sqlContext -> spark --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15678][SQL] Drop cache on appends and overwrites
Github user dongjoon-hyun commented on the pull request: https://github.com/apache/spark/pull/13419 Hi, @sameeragarwal . Is there any reason to use `SQLContext` instead of `SparkSession` in this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15662][SQL] Add since annotation for classes in s...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13406#discussion_r65262391 --- Diff: core/src/main/java/org/apache/spark/api/java/function/package.scala --- @@ -22,4 +22,4 @@ package org.apache.spark.api.java * these interfaces to pass functions to various Java API methods for Spark. Please visit Spark's * Java programming guide for more details. */ -package object function +package object function --- End diff -- It isn't there. Apache Git Repository shows that line 25 is the last one. ``` 20 /** 21 * Set of interfaces to represent functions in Spark's Java API. Users create implementations of 22 * these interfaces to pass functions to various Java API methods for Spark. Please visit Spark's 23 * Java programming guide for more details. 24 */ 25 package object function ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15662][SQL] Add since annotation for classes in s...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13406#discussion_r65263287 --- Diff: core/src/main/java/org/apache/spark/api/java/function/package.scala --- @@ -22,4 +22,4 @@ package org.apache.spark.api.java * these interfaces to pass functions to various Java API methods for Spark. Please visit Spark's * Java programming guide for more details. */ -package object function +package object function --- End diff -- In my PR, there is line 26. https://github.com/apache/spark/pull/13404/files#diff-c8ebb678d9e773dd03e05b0bca473d17R26 Did I miss something? I think I'm bothering you in this PR. :) If you don't want to update this PR. I'll reopen mine again to show it clearly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15662][SQL] Add since annotation for classes in s...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13406#discussion_r65264316 --- Diff: core/src/main/java/org/apache/spark/api/java/function/package.scala --- @@ -22,4 +22,4 @@ package org.apache.spark.api.java * these interfaces to pass functions to various Java API methods for Spark. Please visit Spark's * Java programming guide for more details. */ -package object function +package object function --- End diff -- @rxin . Yep. It's not worth of it. Let's forget about this for now. Please never mind my last comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15662][SQL] Add since annotation for classes in s...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13406#discussion_r65264904 --- Diff: core/src/main/java/org/apache/spark/api/java/function/package.scala --- @@ -22,4 +22,4 @@ package org.apache.spark.api.java * these interfaces to pass functions to various Java API methods for Spark. Please visit Spark's * Java programming guide for more details. */ -package object function +package object function --- End diff -- Now. I see what you mean. We are mentioning different one. Your example is about `carriage return`. What I meat is `org.scalastyle.file.WhitespaceEndOfLineChecker`. In 'src/scala', please choose one scala file and delete the last empty line and run `dev/scalastyle`. You can see the violation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15662][SQL] Add since annotation for classes in s...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13406#discussion_r65265296 --- Diff: core/src/main/java/org/apache/spark/api/java/function/package.scala --- @@ -22,4 +22,4 @@ package org.apache.spark.api.java * these interfaces to pass functions to various Java API methods for Spark. Please visit Spark's * Java programming guide for more details. */ -package object function +package object function --- End diff -- This file is not covered by Scalastyle since it's in `src/java`. But again, it's not worth of wasting your time. You can merge this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-15662][SQL] Add since annotation for classes in s...
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/13406#discussion_r65269018 --- Diff: core/src/main/java/org/apache/spark/api/java/function/package.scala --- @@ -22,4 +22,4 @@ package org.apache.spark.api.java * these interfaces to pass functions to various Java API methods for Spark. Please visit Spark's * Java programming guide for more details. */ -package object function +package object function --- End diff -- Oh, you're right. I checked out your PR locally and tested a minute ago. I was completely wrong at this. So sorry!! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org