[GitHub] spark issue #13487: [SPARK-15744][SQL] Rename two TungstenAggregation*Suites...

2016-06-03 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/13487
  
Thank you, @rxin .


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13487: [MINOR][SQL] Update testsuites/comments/error mes...

2016-06-02 Thread dongjoon-hyun
GitHub user dongjoon-hyun opened a pull request:

https://github.com/apache/spark/pull/13487

[MINOR][SQL] Update testsuites/comments/error messages about 
Tungsten/SortBasedAggregate.

## What changes were proposed in this pull request?

For consistency, this PR updates some remaining 
`TungstenAggregation/SortBasedAggregate` after SPARK-15728.
- Update a comment in codegen in `VectorizedHashMapGenerator.scala`.
- `TungstenAggregationQuerySuite` --> `HashAggregationQuerySuite`
- `TungstenAggregationQueryWithControlledFallbackSuite` --> 
`HashAggregationQueryWithControlledFallbackSuite`
- Update two error messages in `SQLQuerySuite.scala` and 
`AggregationQuerySuite.scala`.
- Update several comments.

## How was this patch tested?

Manual (Only comment changes and test suite renamings).

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dongjoon-hyun/spark SPARK-15744

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13487.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13487


commit 345b1916d8a6dcfc05c2b4958aec71e21138e3e5
Author: Dongjoon Hyun <dongj...@apache.org>
Date:   2016-06-03T00:21:55Z

[MINOR][SQL] Update testsuites/comments/error messages about 
Tungsten/SortBasedAggregate.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13486: [SPARK-15743][SQL] Prevent saving with all-column...

2016-06-02 Thread dongjoon-hyun
GitHub user dongjoon-hyun opened a pull request:

https://github.com/apache/spark/pull/13486

[SPARK-15743][SQL] Prevent saving with all-column partitioning

## What changes were proposed in this pull request?

When saving datasets on storage, `partitionBy` provides an easy way to 
construct the directory structure. However, if a user choose all columns as 
partition columns, some exceptions occurs.

- **ORC with all column partitioning**: `AnalysisException` on **future 
read** due to schema inference failure.
 ```
scala> 
spark.range(10).write.format("orc").mode("overwrite").partitionBy("id").save("/tmp/data")


scala> spark.read.format("orc").load("/tmp/data").collect()
org.apache.spark.sql.AnalysisException: Unable to infer schema for ORC at 
/tmp/data. It must be specified manually;
```

- **Parquet with all-column partitioning**: `InvalidSchemaException` on 
**write execution** due to Parquet limitation.
 ```
scala> 
spark.range(100).write.format("parquet").mode("overwrite").partitionBy("id").save("/tmp/data")
[Stage 0:>  (0 + 8) 
/ 8]16/06/02 16:51:17
ERROR Utils: Aborting task
org.apache.parquet.schema.InvalidSchemaException: A group type can not be 
empty. Parquet does not support empty group without leaves. Empty group: 
spark_schema
... (lots of error messages)
```

Although some formats like JSON support all-column partitioning without any 
problem, it seems not a good idea to make lots of empty directories. 

This PR prevents saving with all-column partitioning by consistently 
raising `AnalysisException` before saving. 

## How was this patch tested?

Newly added `PartitioningUtilsSuite`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dongjoon-hyun/spark SPARK-15743

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13486.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13486


commit bb97467dba96604d26d45763f4115152640ff189
Author: Dongjoon Hyun <dongj...@apache.org>
Date:   2016-06-02T23:14:50Z

[SPARK-15743][SQL] Prevent saving with all-column partitioning




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13403: [SPARK-15660][CORE] RDD and Dataset should show the cons...

2016-06-09 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/13403
  
What about just adding an explicit note on old `StatCounter.stdev`?


http://spark.apache.org/docs/2.0.0-preview/api/scala/index.html#org.apache.spark.util.StatCounter

MLLIB `stat.Statistics` is also consistent with `Dataset`. 
```
scala> import org.apache.spark.mllib.linalg.Vectors
scala> import org.apache.spark.mllib.stat.{MultivariateStatisticalSummary, 
Statistics}
scala> 
Statistics.colStats(sc.parallelize(Seq(Vectors.dense(1.0),Vectors.dense(2.0),Vectors.dense(3.0.variance
res10: org.apache.spark.mllib.linalg.Vector = [1.0]
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13403: [SPARK-15660][CORE] RDD and Dataset should show the cons...

2016-06-09 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/13403
  
Although we can not change old API, I think it's a good idea to add 
`popVariance` and `popStdev` clearly.

If everything in this PR is now allowed, what about just adding an explicit 
note on old `StatCounter.variance` and `StatCounter.stdev`?


http://spark.apache.org/docs/2.0.0-preview/api/scala/index.html#org.apache.spark.util.StatCounter




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13545: [SPARK-15807][SQL] Support varargs for dropDuplicates in...

2016-06-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/13545
  
Hi, @rxin .
I updated this PR and JIRA by removing `distinct`-related changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13486: [SPARK-15743][SQL] Prevent saving with all-column...

2016-06-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13486#discussion_r66349438
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/PartitioningUtilsSuite.scala
 ---
@@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import org.apache.spark.sql.AnalysisException
+import org.apache.spark.sql.test.SharedSQLContext
+
+class PartitioningUtilsSuite extends SharedSQLContext {
--- End diff --

Sure. No problem. I'll put them into `DataFrameReaderWriterSuite`, too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13486: [SPARK-15743][SQL] Prevent saving with all-column...

2016-06-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13486#discussion_r66349371
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala
 ---
@@ -339,7 +339,7 @@ private[sql] object PartitioningUtils {
   private val upCastingOrder: Seq[DataType] =
 Seq(NullType, IntegerType, LongType, FloatType, DoubleType, StringType)
 
-  def validatePartitionColumnDataTypes(
+  def validatePartitionColumnDataTypesAndCount(
--- End diff --

Thank you for review, @marmbrus .
That sounds better. I'll update that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13486: [SPARK-15743][SQL] Prevent saving with all-column partit...

2016-06-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/13486
  
Hi, @marmbrus .
Now, the PR is updated according to your advice and passed the Jenkins 
again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13520: [SPARK-15773][CORE][EXAMPLE] Avoid creating local variab...

2016-06-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/13520
  
Since this is about examples, I think the shorter is the better.
Users can think simply `parallelize` or `broadcast` are just one of 
functions without knowing `SparkContext`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13520: [SPARK-15773][CORE][EXAMPLE] Avoid creating local variab...

2016-06-06 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/13520
  
Initially, I thought the printed message was wrong in the statement 
`println("Creating SparkContext")` because `spark.sparkContext` is just to 
return the already existing one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13520: [SPARK-15773][CORE][EXAMPLE] Avoid creating local variab...

2016-06-06 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/13520
  
Thank you, @srowen !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13520: [SPARK-15773][CORE][EXAMPLE] Avoid creating local variab...

2016-06-06 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/13520
  
Thank you for review, @rxin and @srowen .

The main rational of this PR is to make `SparkSession` explicitly as a 
starting point for the operations in these examples. (Instead of SparkContext, 
sc).

`Spark` uses natually `'.'` to make a long sequence of operations, i.e, 
`sc.parallelize().map().reduce()` or 
`spark.createDataFrame().toDF().stat.crosstab().show()`. And, before 
`SparkSession`, the starting points were `SparkContext` and 
`Dataset/Dataframe/RDD`.

This PR tried to treat `SparkSession` and `Dataset/Dataframe/RDD` as the 
starting points in these examples and didn't touch other examples which `sc` is 
repeated a lot.

The other things like replacing `var` with `val` are irrelevant. I can 
revert them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13545: [SPARK-15807][SQL] Support varargs for distinct/d...

2016-06-07 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13545#discussion_r66152341
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -2262,6 +2275,19 @@ class Dataset[T] private[sql](
   def distinct(): Dataset[T] = dropDuplicates()
 
   /**
+   * Returns a new [[Dataset]] that contains only the unique rows from 
this [[Dataset]], considering
+   * only the subset of columns. This is an alias for 
`dropDuplicates(cols)`.
+   *
+   * Note that, equality checking is performed directly on the encoded 
representation of the data
+   * and thus is not affected by a custom `equals` function defined on `T`.
+   *
+   * @group typedrel
+   * @since 2.0.0
+   */
+  @scala.annotation.varargs
+  def distinct(cols: String*): Dataset[T] = dropDuplicates(cols)
--- End diff --

Thank you always for fast feedbacks, @rxin . And for nice lunch. :)

Yes, right. For this, maybe it's not needed because `distinct` is usually 
used with `select`. 
Also, we can use `dropDuplicates` since it's just an alias of 
`dropDuplicates`.

I think `distinct` is a function name which is more consistent with SQL. If 
we have this, we can do this, too.
```
ds.select("_1", "_2", "_3").distinct("_1").orderBy("_1", "_2").show()
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13545: [SPARK-15807][SQL] Support varargs for distinct/dropDupl...

2016-06-07 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/13545
  
What do you think `dropDuplicates`?

1. ds.select("_1", "_2", "_3").dropDuplicates(Seq("_1", 
"_2")).orderBy("_1", "_2").show()
2. ds.select("_1", "_2", "_3").dropDuplicates("_1", "_2").orderBy("_1", 
"_2").show()

I think the second is more consistent with the others, `select` and 
`orderBy`.
Do you dislike this one too?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13545: [SPARK-15807][SQL] Support varargs for distinct/d...

2016-06-07 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13545#discussion_r66156310
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -2262,6 +2275,19 @@ class Dataset[T] private[sql](
   def distinct(): Dataset[T] = dropDuplicates()
 
   /**
+   * Returns a new [[Dataset]] that contains only the unique rows from 
this [[Dataset]], considering
+   * only the subset of columns. This is an alias for 
`dropDuplicates(cols)`.
+   *
+   * Note that, equality checking is performed directly on the encoded 
representation of the data
+   * and thus is not affected by a custom `equals` function defined on `T`.
+   *
+   * @group typedrel
+   * @since 2.0.0
+   */
+  @scala.annotation.varargs
+  def distinct(cols: String*): Dataset[T] = dropDuplicates(cols)
--- End diff --

In addition, `distinct` of `dplyr` R packages works in the same manner.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13486: [SPARK-15743][SQL] Prevent saving with all-column partit...

2016-06-06 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/13486
  
Hi, @marmbrus .
Could you review this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13634: [SPARK-15913][CORE] Dispatcher.stopped should be ...

2016-06-12 Thread dongjoon-hyun
GitHub user dongjoon-hyun opened a pull request:

https://github.com/apache/spark/pull/13634

[SPARK-15913][CORE] Dispatcher.stopped should be enclosed by synchronized 
block.

## What changes were proposed in this pull request?

`Dispatcher.stopped` is guarded by `this`, but it is used without 
synchronization in `postMessage` function. This PR fixes this and also the 
exception message became more accurate.

## How was this patch tested?

Pass the existing Jenkins tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dongjoon-hyun/spark SPARK-15913

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13634.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13634


commit 75a5254371374faf66f166e1b2683d3f9803cb8e
Author: Dongjoon Hyun <dongj...@apache.org>
Date:   2016-06-13T05:53:47Z

[SPARK-15913][CORE] Dispatcher.stopped should be enclosed by synchronized 
block.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13634: [SPARK-15913][CORE] Dispatcher.stopped should be enclose...

2016-06-12 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/13634
  
Hi, @vanzin .
Could you review this when you have some time?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13436: [SPARK-15696][SQL] Improve `crosstab` to have a consiste...

2016-06-10 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/13436
  
Thank you, @rxin .


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13486: [SPARK-15743][SQL] Prevent saving with all-column partit...

2016-06-10 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/13486
  
Hi, @marmbrus .
Could you review this PR again?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13608: [SPARK-15883][MLLIB][DOCS] Fix broken links in mllib doc...

2016-06-10 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/13608
  
Actually, at this time, I manually clicked all the link in mllib 
documentation. Maybe, later, we can make some simple crawler to check this kind 
of errors.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13486: [SPARK-15743][SQL] Prevent saving with all-column partit...

2016-06-10 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/13486
  
Thank you, @marmbrus !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13608: [SPARK-15883][MLLIB][DOCS] Fix broken links in ml...

2016-06-10 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13608#discussion_r66682287
  
--- Diff: docs/mllib-data-types.md ---
@@ -535,12 +537,6 @@ rowsRDD = mat.rows
 
 # Convert to a RowMatrix by dropping the row indices.
 rowMat = mat.toRowMatrix()
-
-# Convert to a CoordinateMatrix.
-coordinateMat = mat.toCoordinateMatrix()
-
-# Convert to a BlockMatrix.
-blockMat = mat.toBlockMatrix()
--- End diff --

This is a redundant and inconsistent code existing only `Python` part in 
this section.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13608: [SPARK-15883][MLLIB][DOCS] Fix broken links in ml...

2016-06-10 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13608#discussion_r66682877
  
--- Diff: docs/mllib-linear-methods.md ---
@@ -185,10 +185,10 @@ algorithm for 200 iterations.
 import org.apache.spark.mllib.optimization.L1Updater
 
 val svmAlg = new SVMWithSGD()
-svmAlg.optimizer.
-  setNumIterations(200).
-  setRegParam(0.1).
-  setUpdater(new L1Updater)
+svmAlg.optimizer
--- End diff --

I changed the trailing dot ('.').


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13608: [SPARK-15883][MLLIB][DOCS] Fix broken links in mllib doc...

2016-06-10 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/13608
  
Yep. I already built this with Jekyll locally and checked the result 
manually, too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13608: [SPARK-15883][MLLIB][DOCS] Fix broken links in mllib doc...

2016-06-10 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/13608
  
Thank you for fast review, @srowen .


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13608: [SPARK-15883][MLLIB][DOCS] Fix broken links in ml...

2016-06-10 Thread dongjoon-hyun
GitHub user dongjoon-hyun opened a pull request:

https://github.com/apache/spark/pull/13608

[SPARK-15883][MLLIB][DOCS] Fix broken links in mllib documents

## What changes were proposed in this pull request?

This issue fixes all broken links on Spark 2.0 preview MLLib documents. 
Also, this contains some editorial change.

**Fix broken links**
  * mllib-data-types.md
  * mllib-decision-tree.md
  * mllib-ensembles.md
  * mllib-feature-extraction.md
  * mllib-pmml-model-export.md
  * mllib-statistics.md

**Fix malformed section header and scala coding style**
  * mllib-linear-methods.md

**Replace indirect forward links with direct one**
  * ml-classification-regression.md

## How was this patch tested?

Manual tests (with `cd docs; jekyll build`.)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dongjoon-hyun/spark SPARK-15883

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13608.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13608


commit 3e4cdc14a386e3a1d8e301995450db255b32486a
Author: Dongjoon Hyun <dongj...@apache.org>
Date:   2016-06-10T21:11:31Z

[SPARK-15883][MLLIB][DOCS] Fix broken links in mllib documents




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13520: [SPARK-15773][CORE][EXAMPLE] Avoid creating local variab...

2016-06-10 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/13520
  
Thank you, @rxin !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13545: [SPARK-15807][SQL] Support varargs for dropDuplicates in...

2016-06-10 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/13545
  
Thank you again, @rxin .


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13608: [SPARK-15883][MLLIB][DOCS] Fix broken links in ml...

2016-06-10 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13608#discussion_r66687221
  
--- Diff: docs/mllib-linear-methods.md ---
@@ -395,7 +395,7 @@ section of the Spark
 quick-start guide. Be sure to also include *spark-mllib* to your build 
file as
 a dependency.
 
-###Streaming linear regression
--- End diff --

Currently, this does not make a section.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13545: [SPARK-15807][SQL] Support varargs for dropDuplicates in...

2016-06-10 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/13545
  
Hi, @rxin .
For `dropDuplicates`, this PR definitely adds a new signature.
However, I think this is the right direction to improve user experience 
because they expect the same usage pattern for `dropDuplicates`.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13608: [SPARK-15883][MLLIB][DOCS] Fix broken links in ml...

2016-06-11 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13608#discussion_r66703204
  
--- Diff: docs/mllib-linear-methods.md ---
@@ -185,10 +185,10 @@ algorithm for 200 iterations.
 import org.apache.spark.mllib.optimization.L1Updater
 
 val svmAlg = new SVMWithSGD()
-svmAlg.optimizer.
-  setNumIterations(200).
-  setRegParam(0.1).
-  setUpdater(new L1Updater)
+svmAlg.optimizer
--- End diff --

Thanks. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13436: [SPARK-15696][SQL] Improve `crosstab` to have a consiste...

2016-06-09 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/13436
  
Hi, @rxin .
Could you review this PR and give some opinion when you have some time?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13486: [SPARK-15743][SQL] Prevent saving with all-column...

2016-06-04 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13486#discussion_r65799702
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala
 ---
@@ -350,6 +350,10 @@ private[sql] object PartitioningUtils {
 case _ => throw new AnalysisException(s"Cannot use 
${field.dataType} for partition column")
   }
 }
+
+if (partitionColumns.size == schema.fields.size) {
+  throw new AnalysisException(s"Cannot use all columns for partition 
columns")
+}
   }
--- End diff --

Then, let's change it. :)
Since `PartitionUtils` is `private[sql]`, it's safe to be changed.
I'll update this PR. Thank you for your review and idea!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13486: [SPARK-15743][SQL] Prevent saving with all-column partit...

2016-06-04 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/13486
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59987/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13486: [SPARK-15743][SQL] Prevent saving with all-column...

2016-06-04 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13486#discussion_r65799585
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/PartitioningUtils.scala
 ---
@@ -350,6 +350,10 @@ private[sql] object PartitioningUtils {
 case _ => throw new AnalysisException(s"Cannot use 
${field.dataType} for partition column")
   }
 }
+
+if (partitionColumns.size == schema.fields.size) {
+  throw new AnalysisException(s"Cannot use all columns for partition 
columns")
+}
   }
--- End diff --

Thank you for attention, @wangyang1992 . Good point!
Maybe, `validatePartitionColumnDataTypes` -> 
`validatePartitionColumnDataTypesAndCount` ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13486: [SPARK-15743][SQL] Prevent saving with all-column partit...

2016-06-04 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/13486
  
**[Test build #59986 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59986/consoleFull)**
 for PR 13486 at commit 
[`9c5f13d`](https://github.com/apache/spark/commit/9c5f13d6e7c020fb7d983e607116683e4b007f05).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13486: [SPARK-15743][SQL] Prevent saving with all-column partit...

2016-06-04 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/13486
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59986/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13486: [SPARK-15743][SQL] Prevent saving with all-column partit...

2016-06-04 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/13486
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13486: [SPARK-15743][SQL] Prevent saving with all-column partit...

2016-06-04 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/13486
  
**[Test build #59987 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59987/consoleFull)**
 for PR 13486 at commit 
[`6a9006d`](https://github.com/apache/spark/commit/6a9006d25a1566b4e17021bff7405992f872e6c6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13486: [SPARK-15743][SQL] Prevent saving with all-column partit...

2016-06-04 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/13486
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MINOR][CORE] Fix a HadoopRDD log message and ...

2016-05-25 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the pull request:

https://github.com/apache/spark/pull/13294#issuecomment-221656682
  
Thank you, @andrewor14 and @srowen !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15512][CORE] repartition(0) should rais...

2016-05-24 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the pull request:

https://github.com/apache/spark/pull/13282#issuecomment-221426314
  
Yes. They need this. I'll add that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13545: [SPARK-15807][SQL] Support varargs for dropDuplicates in...

2016-06-11 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/13545
  
Thank you for merging, @rxin !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13634: [SPARK-15913][CORE] Dispatcher.stopped should be enclose...

2016-06-13 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/13634
  
Thank you for review, @srowen .
Oh, right. That sounds much better to me.
I'll update this PR like that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13634: [SPARK-15913][CORE] Dispatcher.stopped should be ...

2016-06-13 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13634#discussion_r66761760
  
--- Diff: core/src/main/scala/org/apache/spark/rpc/netty/Dispatcher.scala 
---
@@ -144,24 +144,21 @@ private[netty] class Dispatcher(nettyEnv: 
NettyRpcEnv) extends Logging {
   endpointName: String,
   message: InboxMessage,
   callbackIfStopped: (Exception) => Unit): Unit = {
-val shouldCallOnStop = synchronized {
+val error: Option[Exception] = synchronized {
   val data = endpoints.get(endpointName)
-  if (stopped || data == null) {
-true
+  if (stopped) {
+Some(new RpcEnvStoppedException())
+  } else if (data == null) {
+Some(new SparkException(s"Could not find $endpointName."))
   } else {
 data.inbox.post(message)
 receivers.offer(data)
-false
+None
   }
 }
-if (shouldCallOnStop) {
+if (error.isDefined) {
--- End diff --

Thank you again. I'll change both, too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13634: [SPARK-15913][CORE] Dispatcher.stopped should be enclose...

2016-06-13 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/13634
  
Thank you always, @srowen .


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13634: [SPARK-15913][CORE] Dispatcher.stopped should be enclose...

2016-06-13 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/13634
  
Thank you, @vanzin !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13643: [SPARK-15922][MLLIB] `toIndexedRowMatrix` should ...

2016-06-13 Thread dongjoon-hyun
GitHub user dongjoon-hyun opened a pull request:

https://github.com/apache/spark/pull/13643

[SPARK-15922][MLLIB] `toIndexedRowMatrix` should consider the case `cols < 
colsPerBlock`

## What changes were proposed in this pull request?

SPARK-15922 reports the following scenario throwing an exception due to the 
mismatched vector sizes. This PR handles the exceptional case, `cols < 
colsPerBlock`.

**Before**
```scala
scala> import org.apache.spark.mllib.linalg.distributed._
scala> import org.apache.spark.mllib.linalg._
scala> val rows = IndexedRow(0L, new DenseVector(Array(1,2,3))) :: 
IndexedRow(1L, new DenseVector(Array(1,2,3))):: IndexedRow(2L, new 
DenseVector(Array(1,2,3))):: Nil
scala> val rdd = sc.parallelize(rows)
scala> val matrix = new IndexedRowMatrix(rdd, 3, 3)
scala> val bmat = matrix.toBlockMatrix
scala> val imat = bmat.toIndexedRowMatrix
scala> imat.rows.collect
... throw exception
```

**After**
```scala
...
scala> imat.rows.collect
res0: Array[org.apache.spark.mllib.linalg.distributed.IndexedRow] = 
Array(IndexedRow(0,[1.0,2.0,3.0]), IndexedRow(1,[1.0,2.0,3.0]), 
IndexedRow(2,[1.0,2.0,3.0]))
```

## How was this patch tested?

Pass the Jenkins tests (including the above case)

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dongjoon-hyun/spark SPARK-15922

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13643.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13643


commit 85054becae5eb0075620bd674d534ea27a9268b5
Author: Dongjoon Hyun <dongj...@apache.org>
Date:   2016-06-13T18:28:05Z

[SPARK-15922][MLLIB] `toIndexedRowMatrix` should consider the case `cols < 
colsPerBlock`




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13684: [SPARK-15908][R] Add varargs-type dropDuplicates(...

2016-06-15 Thread dongjoon-hyun
GitHub user dongjoon-hyun opened a pull request:

https://github.com/apache/spark/pull/13684

[SPARK-15908][R] Add varargs-type dropDuplicates() function in SparkR

## What changes were proposed in this pull request?

This PR adds varargs-type `dropDuplicates` function to SparkR for API 
parity. 
Refer to https://issues.apache.org/jira/browse/SPARK-15807, too.

## How was this patch tested?

Pass the Jenkins tests with new testcases.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dongjoon-hyun/spark SPARK-15908

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13684.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13684


commit f1d6355af9dc8e782680a1fc3fac07f8ca31b82b
Author: Dongjoon Hyun <dongj...@apache.org>
Date:   2016-06-15T10:08:28Z

[SPARK-15908][R] Add varargs-type dropDuplicates() function in SparkR




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13636: [SPARK-15637][SPARK-15931][SPARKR] Fix R masked function...

2016-06-15 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/13636
  
This passes for me, too. Thank you, @felixcheung .


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...

2016-06-15 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13592#discussion_r67202830
  
--- Diff: docs/sql-programming-guide.md ---
@@ -889,7 +887,7 @@ df.select("name", 
"favorite_color").write.save("namesAndFavColors.parquet")
 
 
 {% highlight r %}
-df <- read.df(sqlContext, "examples/src/main/resources/users.parquet")
+df <- read.df(spark, "examples/src/main/resources/users.parquet")
--- End diff --

```
df <- read.df("examples/src/main/resources/users.parquet")
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...

2016-06-15 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13592#discussion_r67202934
  
--- Diff: docs/sql-programming-guide.md ---
@@ -939,7 +937,7 @@ df.select("name", 
"age").write.save("namesAndAges.parquet", format="parquet")
 
 {% highlight r %}
 
-df <- read.df(sqlContext, "examples/src/main/resources/people.json", 
"json")
+df <- read.df(spark, "examples/src/main/resources/people.json", "json")
--- End diff --

```
df <- read.df("examples/src/main/resources/people.json", "json")
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...

2016-06-15 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13592#discussion_r67203021
  
--- Diff: docs/sql-programming-guide.md ---
@@ -956,30 +954,30 @@ file directly with SQL.
 
 
 {% highlight scala %}
-val df = sqlContext.sql("SELECT * FROM 
parquet.`examples/src/main/resources/users.parquet`")
+val df = spark.sql("SELECT * FROM 
parquet.`examples/src/main/resources/users.parquet`")
 {% endhighlight %}
 
 
 
 
 
 {% highlight java %}
-DataFrame df = sqlContext.sql("SELECT * FROM 
parquet.`examples/src/main/resources/users.parquet`");
+Dataset df = spark.sql("SELECT * FROM 
parquet.`examples/src/main/resources/users.parquet`");
 {% endhighlight %}
 
 
 
 
 {% highlight python %}
-df = sqlContext.sql("SELECT * FROM 
parquet.`examples/src/main/resources/users.parquet`")
+df = spark.sql("SELECT * FROM 
parquet.`examples/src/main/resources/users.parquet`")
 {% endhighlight %}
 
 
 
 
 
 {% highlight r %}
-df <- sql(sqlContext, "SELECT * FROM 
parquet.`examples/src/main/resources/users.parquet`")
+df <- sql(spark, "SELECT * FROM 
parquet.`examples/src/main/resources/users.parquet`")
--- End diff --

The same.
```
df <- sql("SELECT * FROM 
parquet.`examples/src/main/resources/users.parquet`")
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...

2016-06-15 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13592#discussion_r67203777
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1142,11 +1141,11 @@ write.parquet(schemaPeople, "people.parquet")
 
 # Read in the Parquet file created above. Parquet files are 
self-describing so the schema is preserved.
 # The result of loading a parquet file is also a DataFrame.
-parquetFile <- read.parquet(sqlContext, "people.parquet")
+parquetFile <- read.parquet(spark, "people.parquet")
 
 # Parquet files can also be used to create a temporary view and then used 
in SQL statements.
 registerTempTable(parquetFile, "parquetFile")
--- End diff --

```
createOrReplaceTempView(parquetFile, "parquetFile")
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...

2016-06-15 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13592#discussion_r67204146
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1142,11 +1141,11 @@ write.parquet(schemaPeople, "people.parquet")
 
 # Read in the Parquet file created above. Parquet files are 
self-describing so the schema is preserved.
 # The result of loading a parquet file is also a DataFrame.
-parquetFile <- read.parquet(sqlContext, "people.parquet")
+parquetFile <- read.parquet(spark, "people.parquet")
 
 # Parquet files can also be used to create a temporary view and then used 
in SQL statements.
 registerTempTable(parquetFile, "parquetFile")
-teenagers <- sql(sqlContext, "SELECT name FROM parquetFile WHERE age >= 13 
AND age <= 19")
+teenagers <- sql(spark, "SELECT name FROM parquetFile WHERE age >= 13 AND 
age <= 19")
--- End diff --

```
teenagers <- sql("SELECT name FROM parquetFile WHERE age >= 13 AND age <= 
19")
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...

2016-06-15 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13592#discussion_r67202307
  
--- Diff: docs/sql-programming-guide.md ---
@@ -171,9 +171,9 @@ df.show()
 
 
 {% highlight r %}
-sqlContext <- SQLContext(sc)
+spark <- SparkSession(sc)
 
-df <- read.json(sqlContext, "examples/src/main/resources/people.json")
+df <- read.json(spark, "examples/src/main/resources/people.json")
--- End diff --

In `SparkR`, the above is deprecated. We can use now like the following.
```
df <- read.json("examples/src/main/resources/people.json")
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...

2016-06-15 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13592#discussion_r67202506
  
--- Diff: docs/sql-programming-guide.md ---
@@ -363,10 +363,10 @@ In addition to simple column references and 
expressions, DataFrames also have a
 
 
 {% highlight r %}
-sqlContext <- sparkRSQL.init(sc)
+spark <- sparkRSQL.init(sc)
 
 # Create the DataFrame
-df <- read.json(sqlContext, "examples/src/main/resources/people.json")
+df <- read.json(spark, "examples/src/main/resources/people.json")
--- End diff --

We can remove the following.
```
spark <- sparkRSQL.init(sc)
```
And, use the following.
```
df <- read.json("examples/src/main/resources/people.json")
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...

2016-06-15 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13592#discussion_r67202611
  
--- Diff: docs/sql-programming-guide.md ---
@@ -419,35 +419,35 @@ In addition to simple column references and 
expressions, DataFrames also have a
 
 ## Running SQL Queries Programmatically
 
-The `sql` function on a `SQLContext` enables applications to run SQL 
queries programmatically and returns the result as a `DataFrame`.
+The `sql` function on a `SparkSession` enables applications to run SQL 
queries programmatically and returns the result as a `DataFrame`.
 
 
 
 {% highlight scala %}
-val sqlContext = ... // An existing SQLContext
-val df = sqlContext.sql("SELECT * FROM table")
+val spark = ... // An existing SparkSession
+val df = spark.sql("SELECT * FROM table")
 {% endhighlight %}
 
 
 
 {% highlight java %}
-SQLContext sqlContext = ... // An existing SQLContext
-DataFrame df = sqlContext.sql("SELECT * FROM table")
+SparkSession spark = ... // An existing SparkSession
+Dataset df = spark.sql("SELECT * FROM table")
 {% endhighlight %}
 
 
 
 {% highlight python %}
-from pyspark.sql import SQLContext
-sqlContext = SQLContext(sc)
-df = sqlContext.sql("SELECT * FROM table")
+from pyspark.sql import SparkSession
+spark = SparkSession(sc)
+df = spark.sql("SELECT * FROM table")
 {% endhighlight %}
 
 
 
 {% highlight r %}
-sqlContext <- sparkRSQL.init(sc)
-df <- sql(sqlContext, "SELECT * FROM table")
+spark <- sparkRSQL.init(sc)
+df <- sql(spark, "SELECT * FROM table")
--- End diff --

Here, too. Remove `spark <- sparkRSQL.init(sc)` and use
```
df <- sql("SELECT * FROM table")
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...

2016-06-15 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13592#discussion_r67203431
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1142,11 +1141,11 @@ write.parquet(schemaPeople, "people.parquet")
 
 # Read in the Parquet file created above. Parquet files are 
self-describing so the schema is preserved.
 # The result of loading a parquet file is also a DataFrame.
-parquetFile <- read.parquet(sqlContext, "people.parquet")
+parquetFile <- read.parquet(spark, "people.parquet")
--- End diff --

```
parquetFile <- read.parquet("people.parquet")
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13592: [SPARK-15863][SQL][DOC] Initial SQL programming g...

2016-06-15 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13592#discussion_r67204480
  
--- Diff: docs/sql-programming-guide.md ---
@@ -1326,7 +1325,7 @@ write.df(df1, "data/test_table/key=1", "parquet", 
"overwrite")
 write.df(df2, "data/test_table/key=2", "parquet", "overwrite")
 
 # Read the partitioned table
-df3 <- read.df(sqlContext, "data/test_table", "parquet", 
mergeSchema="true")
+df3 <- read.df(spark, "data/test_table", "parquet", mergeSchema="true")
--- End diff --

```
df3 <- read.df("data/test_table", "parquet", mergeSchema="true")
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13684: [SPARK-15908][R] Add varargs-type dropDuplicates() funct...

2016-06-15 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/13684
  
Hi, @shivaram .
Could you review this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13684: [SPARK-15908][R] Add varargs-type dropDuplicates(...

2016-06-15 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13684#discussion_r67206540
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -1859,7 +1859,7 @@ setMethod("where",
 #' @param colnames A character vector of column names.
--- End diff --

Oh, thank you for review, @shivaram .
Sure. I'll update the doc. Maybe, something like the following?
```
- #' @param colnames A character vector of column names.
+ #' @param col A character vector of column names or a string of column 
name
+ #' @param ... Additional column names
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13684: [SPARK-15908][R] Add varargs-type dropDuplicates(...

2016-06-15 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13684#discussion_r67209107
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -1859,7 +1859,7 @@ setMethod("where",
 #' @param colnames A character vector of column names.
--- End diff --

Yep. Right. I will add that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13684: [SPARK-15908][R] Add varargs-type dropDuplicates(...

2016-06-15 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13684#discussion_r67210755
  
--- Diff: R/pkg/R/DataFrame.R ---
@@ -1869,6 +1869,7 @@ setMethod("where",
 #' path <- "path/to/file.json"
 #' df <- read.json(path)
 #' dropDuplicates(df)
+#' dropDuplicates(df, "col1", "col2")
 #' dropDuplicates(df, c("col1", "col2"))
 #' }
 setMethod("dropDuplicates",
--- End diff --

Actually, I kept the existing `dropDuplicates` since it handles 
`dropDuplicates(df)` for all columns, too.
Don't we still need two functions if we move the case "c(...)"?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13643: [SPARK-15922][MLLIB] `toIndexedRowMatrix` should conside...

2016-06-13 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/13643
  
Thank you again, @srowen .


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13643: [SPARK-15922][MLLIB] `toIndexedRowMatrix` should conside...

2016-06-13 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/13643
  
Hi, @Fokko and @mengxr .
Could you review this PR when you have some time?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13643: [SPARK-15922][MLLIB] `toIndexedRowMatrix` should ...

2016-06-13 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13643#discussion_r66849603
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/BlockMatrix.scala
 ---
@@ -288,7 +288,7 @@ class BlockMatrix @Since("1.3.0") (
 
   vectors.foreach { case (blockColIdx: Int, vec: BV[Double]) =>
 val offset = colsPerBlock * blockColIdx
-wholeVector(offset until offset + colsPerBlock) := vec
+wholeVector(offset until offset + Math.min(cols, colsPerBlock)) := 
vec
--- End diff --

Oh, thank you. Mathematically, yours is correct. I'll fix this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13520: [SPARK-15773][CORE][EXAMPLE] Avoid creating local...

2016-06-05 Thread dongjoon-hyun
GitHub user dongjoon-hyun opened a pull request:

https://github.com/apache/spark/pull/13520

[SPARK-15773][CORE][EXAMPLE] Avoid creating local variable `sc` in examples 
if possible

## What changes were proposed in this pull request?

Instead of using local variable `sc` like the following example, this PR 
uses `spark.sparkContext`. This makes examples more concise, and also fixes 
some misleading, i.e., creating SparkContext from SparkSession.
```
-println("Creating SparkContext")
-val sc = spark.sparkContext
-
 println("Writing local file to DFS")
 val dfsFilename = dfsDirPath + "/dfs_read_write_test"
-val fileRDD = sc.parallelize(fileContents)
+val fileRDD = spark.sparkContext.parallelize(fileContents)
```

This will change 12 files (+30 lines, -52 lines).

## How was this patch tested?

Manual.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dongjoon-hyun/spark SPARK-15773

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13520.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13520


commit 0a5d82fc8c1b3e0910231060090181e143e5215a
Author: Dongjoon Hyun <dongj...@apache.org>
Date:   2016-06-05T21:42:42Z

[SPARK-15773][CORE][EXAMPLE] Avoid creating local variable `sc` in examples 
if possible




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13545: [SPARK-15807][SQL] Support varargs for distinct/d...

2016-06-07 Thread dongjoon-hyun
GitHub user dongjoon-hyun opened a pull request:

https://github.com/apache/spark/pull/13545

[SPARK-15807][SQL] Support varargs for distinct/dropDuplicates in 
Dataset/DataFrame

## What changes were proposed in this pull request?
This PR adds `varargs`-types `distinct/dropDuplicates` functions in 
`Dataset/DataFrame`. Currently, `distinct` does not get arguments, and 
`dropDuplicates` supports only `Seq` or `Array`.

**Before**
```scala
scala> val ds = spark.createDataFrame(Seq(("a", 1), ("b", 2), ("a", 2)))
ds: org.apache.spark.sql.DataFrame = [_1: string, _2: int]

scala> ds.dropDuplicates(Seq("_1", "_2"))
res0: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [_1: string, 
_2: int]

scala> ds.dropDuplicates("_1", "_2")
:26: error: overloaded method value dropDuplicates with 
alternatives:
  (colNames: 
Array[String])org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] 
  (colNames: 
Seq[String])org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] 
  ()org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]
 cannot be applied to (String, String)
   ds.dropDuplicates("_1", "_2")
  ^

scala> ds.distinct("_1", "_2")
:26: error: too many arguments for method distinct: 
()org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]
   ds.distinct("_1", "_2")
```

**After**
```scala
scala> val ds = spark.createDataFrame(Seq(("a", 1), ("b", 2), ("a", 2)))
ds: org.apache.spark.sql.DataFrame = [_1: string, _2: int]

scala> ds.dropDuplicates("_1", "_2")
res0: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [_1: string, 
_2: int]

scala> ds.distinct("_1", "_2")
res1: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [_1: string, 
_2: int]
```

## How was this patch tested?

Pass the Jenkins tests with new testcases.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dongjoon-hyun/spark SPARK-15807

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13545.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13545


commit 33f446f4bb04e2ea0014c385b6f0d1b290db5a90
Author: Dongjoon Hyun <dongj...@apache.org>
Date:   2016-06-07T18:34:24Z

[SPARK-15807][SQL] Support varargs for distinct/dropDuplicates




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15644] [MLlib] [SQL] Replace SQLContext...

2016-05-28 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the pull request:

https://github.com/apache/spark/pull/13380#issuecomment-222337348
  
Hi, @gatorsmile .
Personally, I love this PR. :)
I just hesitated to change the function signatures of MLLIB in #13352 .


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15644] [MLlib] [SQL] Replace SQLContext...

2016-05-28 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13380#discussion_r64996971
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/joins/BroadcastJoinSuite.scala
 ---
@@ -48,7 +48,7 @@ class BroadcastJoinSuite extends QueryTest with 
SQLTestUtils {
   .setMaster("local-cluster[2,1,1024]")
   .setAppName("testing")
 val sc = new SparkContext(conf)
-spark = SparkSession.builder.getOrCreate()
--- End diff --

In your PR, only this line is related.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15644] [MLlib] [SQL] Replace SQLContext...

2016-05-28 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13380#discussion_r64996960
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/joins/BroadcastJoinSuite.scala
 ---
@@ -48,7 +48,7 @@ class BroadcastJoinSuite extends QueryTest with 
SQLTestUtils {
   .setMaster("local-cluster[2,1,1024]")
   .setAppName("testing")
 val sc = new SparkContext(conf)
-spark = SparkSession.builder.getOrCreate()
--- End diff --

FYI, after #13352 , I proceeded to #13365 ([SPARK-15618][SQL][MLLIB] Use 
SparkSession.builder.sparkContext if applicable.)
You can fix the above line like the following.
```
- spark = SparkSession.builder().config(sc.getConf).getOrCreate()
+ spark = SparkSession.builder().sparkContext(sc).getOrCreate()
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15644] [MLlib] [SQL] Replace SQLContext...

2016-05-28 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13380#discussion_r64997097
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/joins/BroadcastJoinSuite.scala
 ---
@@ -48,7 +48,7 @@ class BroadcastJoinSuite extends QueryTest with 
SQLTestUtils {
   .setMaster("local-cluster[2,1,1024]")
   .setAppName("testing")
 val sc = new SparkContext(conf)
-spark = SparkSession.builder.getOrCreate()
--- End diff --

Oh, I checked my PR again and found that I already fix this in my PR.
Yes, right. You had better revert this line to avoid unnecessary conflicts.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15644] [MLlib] [SQL] Replace SQLContext...

2016-05-28 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13380#discussion_r64997103
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/joins/BroadcastJoinSuite.scala
 ---
@@ -48,7 +48,7 @@ class BroadcastJoinSuite extends QueryTest with 
SQLTestUtils {
   .setMaster("local-cluster[2,1,1024]")
   .setAppName("testing")
 val sc = new SparkContext(conf)
-spark = SparkSession.builder.getOrCreate()
--- End diff --

Here.

https://github.com/apache/spark/pull/13365/files#diff-d8244612a613500ec2c52e9ef0538376R47


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15647] [SQL] Fix Boundary Cases in Opti...

2016-05-29 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the pull request:

https://github.com/apache/spark/pull/13392#issuecomment-222346595
  
Thank you for making me up-to-date, @gatorsmile ! 

By the way, there is one correction. My PR is about **parameterizing** the 
following previous code. :)
```
def shouldCodegen: Boolean =
  branches.length < CaseWhen.MAX_NUM_CASES_FOR_CODEGEN
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15557][SQL] expressi[on ((cast(99 as de...

2016-05-27 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13368#discussion_r64979724
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala
 ---
@@ -290,11 +290,6 @@ object TypeCoercion {
   // Skip nodes who's children have not been resolved yet.
   case e if !e.childrenResolved => e
 
-  case a @ BinaryArithmetic(left @ StringType(), right @ 
DecimalType.Expression(_, _)) =>
-a.makeCopy(Array(Cast(left, DecimalType.SYSTEM_DEFAULT), right))
-  case a @ BinaryArithmetic(left @ DecimalType.Expression(_, _), right 
@ StringType()) =>
-a.makeCopy(Array(left, Cast(right, DecimalType.SYSTEM_DEFAULT)))
-
--- End diff --

Hi, @dilipbiswal .
IMHO, the root cause seems to be **decimal multiplication** between 
`decimal(38,18)`s.
```
scala> sql("select cast(10 as decimal(38,18)) * cast(10 as 
decimal(38,18))").head
res0: org.apache.spark.sql.Row = [null]
```
How do you think about this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MINOR][CORE][DOCS] Fix description of FilterF...

2016-05-31 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the pull request:

https://github.com/apache/spark/pull/13404#issuecomment-222611700
  
Thank you, @rxin !
Then, I'll close this PR now. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MINOR][CORE][DOCS] Fix description of FilterF...

2016-05-30 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13404#discussion_r65125923
  
--- Diff: 
core/src/main/java/org/apache/spark/api/java/function/package.scala ---
@@ -22,4 +22,5 @@ package org.apache.spark.api.java
  * these interfaces to pass functions to various Java API methods for 
Spark. Please visit Spark's
  * Java programming guide for more details.
  */
-package object function 
--- End diff --

This is just removing one ending space and adding one blank line.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MINOR][CORE][DOCS] Fix description of FilterF...

2016-05-31 Thread dongjoon-hyun
Github user dongjoon-hyun closed the pull request at:

https://github.com/apache/spark/pull/13404


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MINOR][CORE][DOC Fix description of FilterFun...

2016-05-30 Thread dongjoon-hyun
GitHub user dongjoon-hyun opened a pull request:

https://github.com/apache/spark/pull/13404

[MINOR][CORE][DOC Fix description of FilterFunction

## What changes were proposed in this pull request?

This PR fixes the wrong description of `FilterFunction`.
```
- * If the function returns true, the element is discarded in the returned 
Dataset.
+ * If the function returns true, the element is included in the returned 
Dataset.
```

## How was this patch tested?




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dongjoon-hyun/spark minor_fix_java_api

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13404.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13404


commit 94f666a54c4865ec2d915ae1a7250506aa836faf
Author: Dongjoon Hyun <dongj...@apache.org>
Date:   2016-05-31T05:31:39Z

[MINOR][CORE] Fix description of FilterFunction




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15076][SQL] Improve ConstantFolding opt...

2016-05-31 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/12850#discussion_r65127553
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -751,6 +751,16 @@ object ConstantFolding extends Rule[LogicalPlan] {
 
   // Fold expressions that are foldable.
   case e if e.foldable => Literal.create(e.eval(EmptyRow), e.dataType)
+
+  // Use associative property for integral type
+  case e if e.isInstanceOf[BinaryArithmetic] && 
e.dataType.isInstanceOf[IntegralType]
+=> e match {
+case Add(Add(a, b), c) if b.foldable && c.foldable => Add(a, 
Add(b, c))
--- End diff --

Thank you for review, @cloud-fan ! 
I see. That sounds great.
Let me think about how to eliminate all constants then.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15660][CORE] RDD and Dataset should sho...

2016-05-31 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the pull request:

https://github.com/apache/spark/pull/13403#issuecomment-222614982
  
Thank you for review again @rxin. 

Actually, I fully understand and expect your decision.
The reason why I making this issue is I think we need explicit discussions 
and the conclusion for this issue.

I worried that Spark shows this inconsistency forever implicitly. As we 
know, if we do not this in Spark 2.0, this will happen on Spark 3.0 or maybe 
never because of the same reason.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15076][SQL] Improve ConstantFolding optimizer by ...

2016-05-31 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/12850#discussion_r65237290
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -742,6 +742,23 @@ object InferFiltersFromConstraints extends 
Rule[LogicalPlan] with PredicateHelpe
  * equivalent [[Literal]] values.
  */
 object ConstantFolding extends Rule[LogicalPlan] {
+  private def isAssociativelyFoldable(e: Expression): Boolean =
--- End diff --

Oh, that could be. 

There is some difference on level of granulity.

Join-related optimizers might be improved later to cost-based optimizers 
while ConstantFolder optimizer is just about removing constants on a single 
expression.

Do you think it is a good idea to put the different levels of concerns 
together?

I can do this in any way you decide. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15662][SQL] Add since annotation for classes in s...

2016-05-31 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13406#discussion_r65240101
  
--- Diff: 
core/src/main/java/org/apache/spark/api/java/function/package.scala ---
@@ -22,4 +22,4 @@ package org.apache.spark.api.java
  * these interfaces to pass functions to various Java API methods for 
Spark. Please visit Spark's
  * Java programming guide for more details.
  */
-package object function 
+package object function
--- End diff --

Could you take a look my PR again? Or, 


https://git-wip-us.apache.org/repos/asf?p=spark.git;a=blob;f=core/src/main/java/org/apache/spark/api/java/function/package.scala;h=0f9bac716416264aeba175b90c0b32570bc6dd81;hb=HEAD


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15662][SQL] Add since annotation for cl...

2016-05-31 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13406#discussion_r65136736
  
--- Diff: 
core/src/main/java/org/apache/spark/api/java/function/package.scala ---
@@ -22,4 +22,4 @@ package org.apache.spark.api.java
  * these interfaces to pass functions to various Java API methods for 
Spark. Please visit Spark's
  * Java programming guide for more details.
  */
-package object function 
+package object function
--- End diff --

Could you add one last blank line, too?
IntelliJ shows one blank line, but it does not exists in Git repository.
So, I added one in my PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15662][SQL] Add since annotation for cl...

2016-05-31 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13406#discussion_r65136837
  
--- Diff: 
core/src/main/java/org/apache/spark/api/java/function/package.scala ---
@@ -22,4 +22,4 @@ package org.apache.spark.api.java
  * these interfaces to pass functions to various Java API methods for 
Spark. Please visit Spark's
  * Java programming guide for more details.
  */
-package object function 
+package object function
--- End diff --

I don't know why IntelliJ show blank line there, but I used VIM to fix this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MINOR][SQL][DOCS] Fix docs of Dataset.scala and SQLImpl...

2016-05-31 Thread dongjoon-hyun
GitHub user dongjoon-hyun opened a pull request:

https://github.com/apache/spark/pull/13420

[MINOR][SQL][DOCS] Fix docs of Dataset.scala and SQLImplicits.scala.

## What changes were proposed in this pull request?

This PR fixes a sample code, a description, and indentations in docs.

## How was this patch tested?

Manual.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/dongjoon-hyun/spark minor_fix_dataset_doc

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13420.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13420


commit d208bc757f1dc9ee5b29fbbf4675aae82f689185
Author: Dongjoon Hyun <dongj...@apache.org>
Date:   2016-05-31T19:32:49Z

Fix docs of Dataset and SQLImplicits.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15662][SQL] Add since annotation for classes in s...

2016-05-31 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13406#discussion_r65249012
  
--- Diff: 
core/src/main/java/org/apache/spark/api/java/function/package.scala ---
@@ -22,4 +22,4 @@ package org.apache.spark.api.java
  * these interfaces to pass functions to various Java API methods for 
Spark. Please visit Spark's
  * Java programming guide for more details.
  */
-package object function 
+package object function
--- End diff --

What I meat is we need a line blank line at line number 26. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15618][SQL][MLLIB] Use SparkSession.builder.spark...

2016-05-31 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the pull request:

https://github.com/apache/spark/pull/13365
  
Hi, @andrewor14 .
Could you review this PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15678][SQL] Drop cache on appends and overwrites

2016-05-31 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13419#discussion_r65251560
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetQuerySuite.scala
 ---
@@ -67,6 +67,28 @@ class ParquetQuerySuite extends QueryTest with 
ParquetTest with SharedSQLContext
   TableIdentifier("tmp"), ignoreIfNotExists = true)
   }
 
+  test("drop cache on overwrite") {
+withTempDir { dir =>
+  val path = dir.toString
+  spark.range(1000).write.mode("overwrite").parquet(path)
+  val df = sqlContext.read.parquet(path).cache()
+  assert(df.count() == 1000)
+  sqlContext.range(10).write.mode("overwrite").parquet(path)
--- End diff --

sqlContext -> spark


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15678][SQL] Drop cache on appends and overwrites

2016-05-31 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13419#discussion_r65251574
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetQuerySuite.scala
 ---
@@ -67,6 +67,28 @@ class ParquetQuerySuite extends QueryTest with 
ParquetTest with SharedSQLContext
   TableIdentifier("tmp"), ignoreIfNotExists = true)
   }
 
+  test("drop cache on overwrite") {
+withTempDir { dir =>
+  val path = dir.toString
+  spark.range(1000).write.mode("overwrite").parquet(path)
+  val df = sqlContext.read.parquet(path).cache()
+  assert(df.count() == 1000)
+  sqlContext.range(10).write.mode("overwrite").parquet(path)
+  assert(sqlContext.read.parquet(path).count() == 10)
+}
+  }
+
+  test("drop cache on append") {
+withTempDir { dir =>
+  val path = dir.toString
+  spark.range(1000).write.mode("append").parquet(path)
+  val df = sqlContext.read.parquet(path).cache()
+  assert(df.count() == 1000)
+  sqlContext.range(10).write.mode("append").parquet(path)
--- End diff --

sqlContext -> spark


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15678][SQL] Drop cache on appends and overwrites

2016-05-31 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the pull request:

https://github.com/apache/spark/pull/13419
  
Hi, @sameeragarwal .
Is there any reason to use `SQLContext` instead of `SparkSession` in this 
PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15662][SQL] Add since annotation for classes in s...

2016-05-31 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13406#discussion_r65262391
  
--- Diff: 
core/src/main/java/org/apache/spark/api/java/function/package.scala ---
@@ -22,4 +22,4 @@ package org.apache.spark.api.java
  * these interfaces to pass functions to various Java API methods for 
Spark. Please visit Spark's
  * Java programming guide for more details.
  */
-package object function 
+package object function
--- End diff --

It isn't there. Apache Git Repository shows that line 25 is the last one.
```
  20 /**
  21  * Set of interfaces to represent functions in Spark's Java API. Users 
create implementations of
  22  * these interfaces to pass functions to various Java API methods for 
Spark. Please visit Spark's
  23  * Java programming guide for more details.
  24  */
  25 package object function 
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15662][SQL] Add since annotation for classes in s...

2016-05-31 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13406#discussion_r65263287
  
--- Diff: 
core/src/main/java/org/apache/spark/api/java/function/package.scala ---
@@ -22,4 +22,4 @@ package org.apache.spark.api.java
  * these interfaces to pass functions to various Java API methods for 
Spark. Please visit Spark's
  * Java programming guide for more details.
  */
-package object function 
+package object function
--- End diff --

In my PR, there is line 26.


https://github.com/apache/spark/pull/13404/files#diff-c8ebb678d9e773dd03e05b0bca473d17R26

Did I miss something? I think I'm bothering you in this PR. :)
If you don't want to update this PR. I'll reopen mine again to show it 
clearly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15662][SQL] Add since annotation for classes in s...

2016-05-31 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13406#discussion_r65264316
  
--- Diff: 
core/src/main/java/org/apache/spark/api/java/function/package.scala ---
@@ -22,4 +22,4 @@ package org.apache.spark.api.java
  * these interfaces to pass functions to various Java API methods for 
Spark. Please visit Spark's
  * Java programming guide for more details.
  */
-package object function 
+package object function
--- End diff --

@rxin . Yep. It's not worth of it.
Let's forget about this for now.
Please never mind my last comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15662][SQL] Add since annotation for classes in s...

2016-05-31 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13406#discussion_r65264904
  
--- Diff: 
core/src/main/java/org/apache/spark/api/java/function/package.scala ---
@@ -22,4 +22,4 @@ package org.apache.spark.api.java
  * these interfaces to pass functions to various Java API methods for 
Spark. Please visit Spark's
  * Java programming guide for more details.
  */
-package object function 
+package object function
--- End diff --

Now. I see what you mean. We are mentioning different one.
Your example is about `carriage return`.
What I meat is `org.scalastyle.file.WhitespaceEndOfLineChecker`. 
In 'src/scala', please choose one scala file and delete the last empty line 
and run `dev/scalastyle`.
You can see the violation.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15662][SQL] Add since annotation for classes in s...

2016-05-31 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13406#discussion_r65265296
  
--- Diff: 
core/src/main/java/org/apache/spark/api/java/function/package.scala ---
@@ -22,4 +22,4 @@ package org.apache.spark.api.java
  * these interfaces to pass functions to various Java API methods for 
Spark. Please visit Spark's
  * Java programming guide for more details.
  */
-package object function 
+package object function
--- End diff --

This file is not covered by Scalastyle since it's in `src/java`.
But again, it's not worth of wasting your time. You can merge this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-15662][SQL] Add since annotation for classes in s...

2016-05-31 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/13406#discussion_r65269018
  
--- Diff: 
core/src/main/java/org/apache/spark/api/java/function/package.scala ---
@@ -22,4 +22,4 @@ package org.apache.spark.api.java
  * these interfaces to pass functions to various Java API methods for 
Spark. Please visit Spark's
  * Java programming guide for more details.
  */
-package object function 
+package object function
--- End diff --

Oh, you're right. I checked out your PR locally and tested a minute ago.
I was completely wrong at this. So sorry!!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   9   10   >