[GitHub] spark pull request: [SPARK-11562][SQL] Provide option to switch Sq...

2016-05-12 Thread xguo27
Github user xguo27 commented on the pull request:

https://github.com/apache/spark/pull/9553#issuecomment-218855824
  
Yeah sure @andrewor14 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11562][SQL] Provide option to switch Sq...

2016-05-12 Thread xguo27
Github user xguo27 closed the pull request at:

https://github.com/apache/spark/pull/9553


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12462][SQL] Add ExpressionDescription t...

2016-05-02 Thread xguo27
Github user xguo27 closed the pull request at:

https://github.com/apache/spark/pull/10437


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12981][SQL] Fix Python UDF extraction f...

2016-04-02 Thread xguo27
Github user xguo27 closed the pull request at:

https://github.com/apache/spark/pull/10935


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12981][SQL] Fix Python UDF extraction f...

2016-04-02 Thread xguo27
Github user xguo27 commented on the pull request:

https://github.com/apache/spark/pull/10935#issuecomment-204840629
  
Sure @davies . I will close this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12981][SQL] Fix Python UDF extraction f...

2016-02-29 Thread xguo27
Github user xguo27 commented on the pull request:

https://github.com/apache/spark/pull/10935#issuecomment-190554648
  
Using these two functionally equavalent code snippets:

Scala
```
val data = Seq((1, "1"), (2, "2"), (3, "2"), (1, "3")).toDF("a","b")
val my_filter = sqlContext.udf.register("my_filter", (a:Int) => a==1)
data.select(col("a")).distinct().filter(my_filter(col("a")))
```

Python
```
data = sqlContext.createDataFrame([(1, "1"), (2, "2"), (3, "2"), (1, "3")], 
["a", "b"])
my_filter = udf(lambda a: a == 1, BooleanType())
data.select(col("a")).distinct().filter(my_filter(col("a")))
```

The logical plan comes out `execute(aggregateCondition)` in here is as 
below:


https://github.com/apache/spark/blob/916fc34f98dd731f607d9b3ed657bad6cc30df2c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L801

Scala
```
Aggregate [a#8], [UDF(a#8) AS havingCondition#11]
+- Project [a#8]
   +- Project [_1#6 AS a#8,_2#7 AS b#9]
  +- LocalRelation [_1#6,_2#7], [[1,1],[2,2],[3,2],[1,3]]
```

Python
```
Project [havingCondition#2]
+- Aggregate [a#0L], [pythonUDF#3 AS havingCondition#2]
   +- EvaluatePython PythonUDF#(a#0L), pythonUDF#3: boolean
  +- Project [a#0L]
 +- LogicalRDD [a#0L,b#1], MapPartitionsRDD[4] at 
applySchemaToPythonRDD at NativeMethodAccessorImpl.java:-2
```
We can see in Python's case, we inject an extra Project when 
`execute(aggregateCondition)`going through ExtractPythonUDFs, but 
ResolveAggregateFunctions expects an Aggregate here:


https://github.com/apache/spark/blob/916fc34f98dd731f607d9b3ed657bad6cc30df2c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala#L801-L805


With this fix, the logical plan generated for Python UDFs does not 
construct a Project if it is an Aggregate, making it consistent with its Scala 
counterpart, which gives correct results for ResolveAggregateFunctions to 
consume:

After fix, Python:
```
Aggregate [a#0L], [pythonUDF#3 AS havingCondition#2]
+- EvaluatePython PythonUDF#(a#0L), pythonUDF#3: boolean
   +- Project [a#0L]
  +- LogicalRDD [a#0L,b#1], MapPartitionsRDD[4] at 
applySchemaToPythonRDD at NativeMethodAccessorImpl.java:-2
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12981][SQL] Fix Python UDF extraction f...

2016-02-26 Thread xguo27
Github user xguo27 commented on the pull request:

https://github.com/apache/spark/pull/10935#issuecomment-189430834
  
@rxin Does this fix look good to you?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13422][SQL] Use HashedRelation instead ...

2016-02-21 Thread xguo27
Github user xguo27 commented on the pull request:

https://github.com/apache/spark/pull/11291#issuecomment-186923232
  
@hvanhovell I just rebased with your new PR, do you mind reviewing again?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13422][SQL] Use HashedRelation instead ...

2016-02-21 Thread xguo27
Github user xguo27 commented on the pull request:

https://github.com/apache/spark/pull/11291#issuecomment-186893918
  
@hvanhovell In hashSemiJoin() function, when condition is empty, the 
boundCondition always evaluates to true here:


https://github.com/apache/spark/blob/8f744fe3d931c2380613b8e5bafa1bb1fd292839/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashSemiJoin.scala#L42-L43

so the exists{...} part of these lines behaves as a No-Op.


https://github.com/apache/spark/blob/8f744fe3d931c2380613b8e5bafa1bb1fd292839/sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashSemiJoin.scala#L87-L89


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13422][SQL] Use HashedRelation instead ...

2016-02-21 Thread xguo27
Github user xguo27 commented on the pull request:

https://github.com/apache/spark/pull/11291#issuecomment-186878372
  
@hvanhovell  I see, sorry for my lack of patience. : )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13422][SQL] Use HashedRelation instead ...

2016-02-21 Thread xguo27
Github user xguo27 commented on the pull request:

https://github.com/apache/spark/pull/11291#issuecomment-186876120
  
Looks like the command did not trigger a test?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13422][SQL] Use HashedRelation instead ...

2016-02-21 Thread xguo27
Github user xguo27 commented on the pull request:

https://github.com/apache/spark/pull/11291#issuecomment-186872660
  
@hvanhovell Could you please advise whether this is the right fix? All Left 
Semi related tests passed, but I'm not sure what other impact there might be to 
remove HashSet related methods.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13422][SQL] Use HashedRelation instead ...

2016-02-21 Thread xguo27
GitHub user xguo27 opened a pull request:

https://github.com/apache/spark/pull/11291

[SPARK-13422][SQL] Use HashedRelation instead of HashSet in Left Semi Joins



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xguo27/spark SPARK-13422

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11291.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11291


commit a84975a5fcee4b59cd144e23cca806970dc58164
Author: Xiu Guo <xgu...@gmail.com>
Date:   2016-02-21T17:10:01Z

[SPARK-13422][SQL] Use HashedRelation instead of HashSet in Left Semi Joins




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13366] Support Cartesian join for Datas...

2016-02-20 Thread xguo27
Github user xguo27 commented on the pull request:

https://github.com/apache/spark/pull/11244#issuecomment-186733828
  
Thanks @marmbrus ! I have updated the change following your suggestion.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13366] Support Cartesian join for Datas...

2016-02-17 Thread xguo27
Github user xguo27 commented on a diff in the pull request:

https://github.com/apache/spark/pull/11244#discussion_r53263079
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -680,6 +681,14 @@ class Dataset[T] private[sql](
 joinWith(other, condition, "inner")
   }
 
+  /**
+   * Joins this [[Dataset]] returning a [[Tuple2]] for each pair using 
cartesian join
+   * Note: cartesian joins are very expensive without a filter that can be 
pushed down.
+   *
+   * @since 2.0.0
+   */
+  def joinWith[U](other: Dataset[U]): Dataset[(T, U)] = joinWith(other, 
lit(true), "inner")
--- End diff --

Thanks for your feedback @marmbrus . The only join API in Dataset I can 
find is:


https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L644

which expects a Column. Do you mean to add some other method like the one 
in Dataframe:


https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrame.scala#L383-L385

If so, I'm wondering whether we need to refactor out the code that handles 
encoder?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13366] Support Cartesian join for Datas...

2016-02-17 Thread xguo27
GitHub user xguo27 opened a pull request:

https://github.com/apache/spark/pull/11244

[SPARK-13366] Support Cartesian join for Datasets



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xguo27/spark SPARK-13366

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11244.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11244


commit 27a58df5a07138fc320353dce532955e8abee00d
Author: Xiu Guo <xgu...@gmail.com>
Date:   2016-02-17T22:16:51Z

[SPARK-13366] Support Cartesian join for Datasets




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13283][SQL] Escape column names based o...

2016-02-16 Thread xguo27
Github user xguo27 commented on the pull request:

https://github.com/apache/spark/pull/11224#issuecomment-184916252
  
Yes @JoshRosen , you are referring to integration test, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13283][SQL] Escape column names based o...

2016-02-16 Thread xguo27
GitHub user xguo27 opened a pull request:

https://github.com/apache/spark/pull/11224

[SPARK-13283][SQL] Escape column names based on JdbcDialect



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xguo27/spark SPARK-13283

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11224.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11224


commit 8e143b68c102ec1b55d9e7a64ddf3ea40a95d28a
Author: Xiu Guo <xgu...@gmail.com>
Date:   2016-02-16T21:53:27Z

[SPARK-13283][SQL] Escape column names based on JdbcDialect




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12981][SQL] Fix Python UDF extraction f...

2016-01-26 Thread xguo27
GitHub user xguo27 opened a pull request:

https://github.com/apache/spark/pull/10935

[SPARK-12981][SQL] Fix Python UDF extraction for aggregation.

When Aggregate operator being applied ExtractPythonUDFs rule, it becomes a 
Project. This change fixes that and maintain Aggregate operator to the original 
type.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xguo27/spark SPARK-12981

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10935.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10935


commit d55146f6dd865bff9789a32641de6aa1678b912f
Author: Xiu Guo <xgu...@gmail.com>
Date:   2016-01-26T22:35:50Z

[SPARK-12981][SQL] Fix Python UDF extraction for aggregation.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12562][SQL] DataFrame.write.format(text...

2016-01-04 Thread xguo27
Github user xguo27 commented on a diff in the pull request:

https://github.com/apache/spark/pull/10515#discussion_r48788553
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/text/DefaultSource.scala
 ---
@@ -70,15 +70,16 @@ class DefaultSource extends HadoopFsRelationProvider 
with DataSourceRegister {
 
 private[sql] class TextRelation(
 val maybePartitionSpec: Option[PartitionSpec],
+val textSchema: Option[StructType],
 override val userDefinedPartitionColumns: Option[StructType],
 override val paths: Array[String] = Array.empty[String],
 parameters: Map[String, String] = Map.empty[String, String])
 (@transient val sqlContext: SQLContext)
   extends HadoopFsRelation(maybePartitionSpec, parameters) {
 
-  /** Data schema is always a single column, named "value". */
-  override def dataSchema: StructType = new StructType().add("value", 
StringType)
-
+  /** Data schema is always a single column, named "value" if original 
Data source has no schema. */
+  override def dataSchema: StructType =
+textSchema.getOrElse(new StructType().add("value", StringType))
--- End diff --

@cloud-fan DefaultSource.scala is the only place that creates a 
TextRelation, and it verifies that the schema is size 1 and of type string 
before creating a TextRelation. So I think it is fine not to verify again here. 
What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12562][SQL] DataFrame.write.format(text...

2016-01-03 Thread xguo27
Github user xguo27 commented on the pull request:

https://github.com/apache/spark/pull/10515#issuecomment-168564909
  
@marmbrus Can we trigger a test for this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12562][SQL] DataFrame.write.format(text...

2015-12-30 Thread xguo27
Github user xguo27 commented on a diff in the pull request:

https://github.com/apache/spark/pull/10515#discussion_r48591393
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/text/TextSuite.scala
 ---
@@ -33,8 +33,8 @@ class TextSuite extends QueryTest with SharedSQLContext {
 verifyFrame(sqlContext.read.text(testFile))
   }
 
-  test("writing") {
-val df = sqlContext.read.text(testFile)
+  test("SPARK-12562 verify write.text() can handle column name beyond 
`value`") {
+val df = sqlContext.read.text(testFile).withColumnRenamed("value", 
"adwrasdf")
--- End diff --

After `write.text()`, the local text file actually does not carry the 
schema name like JSON does. When reading back the text file and then call 
`verifyFrame`, it will always have `value` as the column name.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12562][SQL] DataFrame.write.format(text...

2015-12-29 Thread xguo27
Github user xguo27 commented on the pull request:

https://github.com/apache/spark/pull/10515#issuecomment-167915743
  
@marmbrus Thanks Michael for your feedback!

Looks like the 'value' is to give the single string column a arbitrary 
name. Current implementation strips schema information when creating 
TextRelation (after verifying the schema is single field with string type). It 
is fine during read, but fails during write.

Would you mind taking another look at my updated change?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12562][SQL] DataFrame.write.format(text...

2015-12-29 Thread xguo27
GitHub user xguo27 opened a pull request:

https://github.com/apache/spark/pull/10515

[SPARK-12562][SQL] DataFrame.write.format(text) requires the column name to 
be called value



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xguo27/spark SPARK-12562

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10515.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10515


commit 4c5c31fc18f2763151a9d4d6f42ceed5eb43d8a7
Author: Xiu Guo <xgu...@gmail.com>
Date:   2015-12-29T23:49:44Z

[SPARK-12562][SQL] DataFrame.write.format(text) requires the column name to 
be called value




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12562][SQL] DataFrame.write.format(text...

2015-12-29 Thread xguo27
Github user xguo27 commented on the pull request:

https://github.com/apache/spark/pull/10515#issuecomment-167950674
  
Thanks @viirya ! I have updated the comment and added unit test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12562][SQL] DataFrame.write.format(text...

2015-12-29 Thread xguo27
Github user xguo27 commented on a diff in the pull request:

https://github.com/apache/spark/pull/10515#discussion_r48590105
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/text/TextSuite.scala
 ---
@@ -58,6 +58,17 @@ class TextSuite extends QueryTest with SharedSQLContext {
 }
   }
 
+  test("SPARK-12562 verify write.text() can handle column name beyond 
`value`") {
--- End diff --

@rxin I thought about it, but was not sure if it was a good idea to change 
the existing testcase. In the existing test, should I add a second dataframe 
with column renamed, or just replace the original dataframe with column 
renaming?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12512][SQL] support column name with do...

2015-12-28 Thread xguo27
GitHub user xguo27 opened a pull request:

https://github.com/apache/spark/pull/10500

[SPARK-12512][SQL] support column name with dot in withColumn()



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xguo27/spark SPARK-12512

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10500.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10500


commit 1b8d53c4692034ce1b292e74c44db506fdeea9af
Author: Xiu Guo <xgu...@gmail.com>
Date:   2015-12-28T23:37:21Z

[SPARK-12512][SQL] support column name with dot in WithColumn()




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12521][SQL][WIP] JDBCRelation does not ...

2015-12-25 Thread xguo27
Github user xguo27 closed the pull request at:

https://github.com/apache/spark/pull/10473


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12521][SQL][WIP] JDBCRelation does not ...

2015-12-25 Thread xguo27
Github user xguo27 commented on the pull request:

https://github.com/apache/spark/pull/10473#issuecomment-167258976
  
Thanks @hvanhovell for clarifying it up. I will close this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12521][SQL][WIP] JDBCRelation does not ...

2015-12-24 Thread xguo27
GitHub user xguo27 opened a pull request:

https://github.com/apache/spark/pull/10473

[SPARK-12521][SQL][WIP] JDBCRelation does not honor lowerBound/upperBound

JDBCRelation is not bounding the rows when lowerBound/upperBound are given. 
This change honors the bounds given.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xguo27/spark SPARK-12521

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10473.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10473


commit b0d03592716369edb390f7811a5d4d530bb0cfe2
Author: Xiu Guo <xgu...@gmail.com>
Date:   2015-12-25T04:08:35Z

[SPARK-12521][SQL] JDBCRelation does not honor lowerBound/upperBound




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12521][SQL][WIP] JDBCRelation does not ...

2015-12-24 Thread xguo27
Github user xguo27 commented on the pull request:

https://github.com/apache/spark/pull/10473#issuecomment-167190872
  
Marking it [WIP] to invite discussion here. : ) As I suspect the original 
code includes infinity on both smaller than side and greater than side for a 
reason. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12456][SQL] Add ExpressionDescription t...

2015-12-22 Thread xguo27
Github user xguo27 commented on the pull request:

https://github.com/apache/spark/pull/10423#issuecomment-166701737
  
@rxin Great, thanks Reynold! My JIRA id is xguo27.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12462][SQL] Add ExpressionDescription t...

2015-12-22 Thread xguo27
GitHub user xguo27 opened a pull request:

https://github.com/apache/spark/pull/10437

[SPARK-12462][SQL] Add ExpressionDescription to misc non-aggregate functions



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xguo27/spark SPARK-12462

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10437.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10437


commit a0e99210ea7e3068cf07b9f042a084ab8223d7f2
Author: Xiu Guo <xgu...@gmail.com>
Date:   2015-12-22T18:54:52Z

[SPARK-12462][SQL] Add ExpressionDescription to misc non-aggregate functions




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12456][SQL] Add ExpressionDescription t...

2015-12-21 Thread xguo27
GitHub user xguo27 opened a pull request:

https://github.com/apache/spark/pull/10423

[SPARK-12456][SQL] Add ExpressionDescription to misc functions

First try, not sure how much information we need to provide in the usage 
part.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xguo27/spark SPARK-12456

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/10423.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10423


commit 8cae22cd771eea1bc08d1e0903b5e9df6814
Author: Xiu Guo <xgu...@gmail.com>
Date:   2015-12-21T22:55:44Z

[SPARK-12456][SQL] Add ExpressionDescription to misc functions




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-12456][SQL] Add ExpressionDescription t...

2015-12-21 Thread xguo27
Github user xguo27 commented on the pull request:

https://github.com/apache/spark/pull/10423#issuecomment-166531147
  
@rxin Thank you very much for go through the changeset, Reynold! I have 
updated it per your suggestions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11562][SQL] Provide option to switch Sq...

2015-12-19 Thread xguo27
Github user xguo27 commented on a diff in the pull request:

https://github.com/apache/spark/pull/9553#discussion_r48095871
  
--- Diff: 
repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkILoop.scala ---
@@ -1026,17 +1027,30 @@ class SparkILoop(
 
   @DeveloperApi
   def createSQLContext(): SQLContext = {
-val name = "org.apache.spark.sql.hive.HiveContext"
+useHiveContext = 
sparkContext.getConf.getBoolean("spark.sql.useHiveContext", true)
+val name = {
+  if (useHiveContext) "org.apache.spark.sql.hive.HiveContext"
+  else "org.apache.spark.sql.SQLContext"
+}
+
 val loader = Utils.getContextOrSparkClassLoader
 try {
   sqlContext = 
loader.loadClass(name).getConstructor(classOf[SparkContext])
 .newInstance(sparkContext).asInstanceOf[SQLContext]
-  logInfo("Created sql context (with Hive support)..")
+  if (useHiveContext) {
+logInfo("Created sql context (with Hive support). To use 
sqlContext (without Hive), " +
+  "set spark.sql.useHiveContext to false before launching 
spark-shell.")
+  }
+  else {
+logInfo("Created sql context.")
+  }
 }
 catch {
-  case _: java.lang.ClassNotFoundException | _: 
java.lang.NoClassDefFoundError =>
+  case _: java.lang.ClassNotFoundException | _: 
java.lang.NoClassDefFoundError
+if useHiveContext =>
 sqlContext = new SQLContext(sparkContext)
-logInfo("Created sql context..")
+logInfo("Created sql context without Hive support, " +
+  "build Spark with -Phive to enable Hive support.")
--- End diff --

When -Phive is used (which provides necessary hive jars) and an exception 
other than ClassNotFound/NoClassDefFound occured, now how we handle it is to 
let the exception be propagated without creating an alternative SqlContext.

Do you mean by this case, we should catch -> log -> re-throw? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11562][SQL] Provide option to switch Sq...

2015-12-18 Thread xguo27
Github user xguo27 commented on the pull request:

https://github.com/apache/spark/pull/9553#issuecomment-165948481
  
@yhuai I just resolved the conflict. Can we trigger a test? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11562][SQL] Provide option to switch Sq...

2015-12-08 Thread xguo27
Github user xguo27 commented on the pull request:

https://github.com/apache/spark/pull/9553#issuecomment-162964945
  
Hi @yhuai, do you think this is good to merge?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11562][SQL] Provide option to switch Sq...

2015-11-30 Thread xguo27
Github user xguo27 commented on the pull request:

https://github.com/apache/spark/pull/9553#issuecomment-160779029
  
Hi @yhuai @liancheng:

As I was hitting SPARK-2 when testing my code, I rebased my branch and 
squashed my previous commits together. Now the new commit addresses the 
following point you brought up:

1. call conf.getBoolean() to get the conf value at the right place
2. using spark.sql.useHiveContext instead of spark.sql.hive.context
3. using if/else instead of cases true/false
4. provide extra information when logging

Thanks for reviewing my code!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11562][SQL] Provide option to switch Sq...

2015-11-30 Thread xguo27
Github user xguo27 commented on the pull request:

https://github.com/apache/spark/pull/9553#issuecomment-160782796
  
Looks like some git plugin network issue?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11482][SQL] Make maven repo for Hive me...

2015-11-29 Thread xguo27
Github user xguo27 commented on the pull request:

https://github.com/apache/spark/pull/9543#issuecomment-160482218
  
Thanks @yhuai for reviewing my code! I have updated per your suggestion.

To answer your question, I personally do not have a use case for this. My 
take on the JIRA reporter's use case is that user might host their own 
customized/modified Hive jars on their maven site which might provide specific 
functionality.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11482][SQL] Make maven repo for Hive me...

2015-11-28 Thread xguo27
Github user xguo27 commented on the pull request:

https://github.com/apache/spark/pull/9543#issuecomment-160332622
  
@yhuai I see your latest delivery has conflict with this PR, I have 
resolved the conflict and re-pushed. @rxin has been reviewing this PR, I figure 
you might also want to review this PR, just in case I break your code.

Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11631][Scheduler] Adding 'Starting DAGS...

2015-11-24 Thread xguo27
Github user xguo27 commented on the pull request:

https://github.com/apache/spark/pull/9603#issuecomment-159191082
  
OK, I will close it. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11631][Scheduler] Adding 'Starting DAGS...

2015-11-24 Thread xguo27
Github user xguo27 closed the pull request at:

https://github.com/apache/spark/pull/9603


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11897][SQL] Add @scala.annotations.vara...

2015-11-23 Thread xguo27
GitHub user xguo27 opened a pull request:

https://github.com/apache/spark/pull/9918

[SPARK-11897][SQL] Add @scala.annotations.varargs to sql functions



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xguo27/spark SPARK-11897

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/9918.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #9918


commit 1dc9751b8e68ab5c1f681b74c2283eb29addc3b8
Author: Xiu Guo <xgu...@gmail.com>
Date:   2015-11-23T22:26:47Z

[SPARK-11897][SQL] Add @scala.annotations.varargs to sql functions




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11631][Scheduler] Adding 'Starting DAGS...

2015-11-23 Thread xguo27
Github user xguo27 commented on the pull request:

https://github.com/apache/spark/pull/9603#issuecomment-159079918
  
@andrewor14 What is your take on Jacek's comment? I don't think it's a bad 
idea to make it more consistent with a matching log message. Please let me 
know. Thx! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11482][SQL] Make maven repo for Hive me...

2015-11-21 Thread xguo27
Github user xguo27 commented on the pull request:

https://github.com/apache/spark/pull/9543#issuecomment-158694089
  
Sorry about the failure, can we re-test please?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11482][SQL] Make maven repo for Hive me...

2015-11-21 Thread xguo27
Github user xguo27 commented on the pull request:

https://github.com/apache/spark/pull/9543#issuecomment-158678073
  
@rxin Thanks, Reynold! Somehow no test was triggered. Not sure why.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11628][SQL] support column datatype of ...

2015-11-21 Thread xguo27
Github user xguo27 commented on the pull request:

https://github.com/apache/spark/pull/9612#issuecomment-158678513
  
@cloud-fan I have added a few tests per your suggestion. Do they look good 
to you?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11562][SQL] Provide user an option to i...

2015-11-19 Thread xguo27
Github user xguo27 commented on the pull request:

https://github.com/apache/spark/pull/9553#issuecomment-158249118
  
@marmbrus @rxin Does this look good to you guys?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11482][SQL] Make maven repo for Hive me...

2015-11-19 Thread xguo27
Github user xguo27 commented on the pull request:

https://github.com/apache/spark/pull/9543#issuecomment-158248923
  
@marmbrus @rxin What do you think about this change?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11631][Scheduler] Adding 'Starting DAGS...

2015-11-16 Thread xguo27
Github user xguo27 commented on the pull request:

https://github.com/apache/spark/pull/9603#issuecomment-157219217
  
I agree it is trivial, just thought I could quickly add a log statement. If 
Jacek agrees, I can close those PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11631][Scheduler] Adding 'Starting DAGS...

2015-11-12 Thread xguo27
Github user xguo27 commented on a diff in the pull request:

https://github.com/apache/spark/pull/9603#discussion_r44728655
  
--- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
@@ -506,6 +506,7 @@ class SparkContext(config: SparkConf) extends Logging 
with ExecutorAllocationCli
 val (sched, ts) = SparkContext.createTaskScheduler(this, master)
 _schedulerBackend = sched
 _taskScheduler = ts
+logDebug("Starting DAGScheduler")
--- End diff --

Hi Jacek:

My only concern with putting log in DAGScheduler's constructor is that the 
logger might not have been initialized when constructor is called.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11628][SQL][WIP] support column datatyp...

2015-11-12 Thread xguo27
Github user xguo27 commented on the pull request:

https://github.com/apache/spark/pull/9612#issuecomment-156276620
  
Hi Wenchen:

Can you elaborate on using ByteType for char a little more?

Ultimately, the difference between char(x) and varchar(x) is the 
fixed/variable length, which results in padding. So it's a good idea to keep 
the underlying type the same, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11628][SQL][WIP] support column datatyp...

2015-11-10 Thread xguo27
GitHub user xguo27 opened a pull request:

https://github.com/apache/spark/pull/9612

[SPARK-11628][SQL][WIP] support column datatype of Char

Can someone review my code to make sure I'm not missing anything? Thanks!

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xguo27/spark SPARK-11628

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/9612.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #9612


commit 12a7ba151291691d1695fc456da65fb3c005fc2d
Author: Xiu Guo <gu...@us.ibm.com>
Date:   2015-11-11T00:44:22Z

[SPARK-11628][SQL] support column datatype of Char




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11562][SQL] Provide user an option to i...

2015-11-09 Thread xguo27
Github user xguo27 commented on the pull request:

https://github.com/apache/spark/pull/9553#issuecomment-155257787
  
Hi Zhan:

I just updated documentation and added a guard in the code regarding your 
feedback on the exception handler.

Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11482][SQL] Make maven repo for Hive me...

2015-11-09 Thread xguo27
Github user xguo27 commented on the pull request:

https://github.com/apache/spark/pull/9543#issuecomment-155151066
  
Thanks WangTao for your comment!

Based on the comment on my other PR for Spark-11562, I will also add 
documentation for this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11562][SQL] Provide user an option to i...

2015-11-08 Thread xguo27
GitHub user xguo27 opened a pull request:

https://github.com/apache/spark/pull/9553

[SPARK-11562][SQL] Provide user an option to init SQLContext or HiveContext 
in spark shell

Introducing a boolean property 'spark.sql.hive.context' to turn HiveContext 
on and off as the default sqlContext type.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xguo27/spark SPARK-11562

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/9553.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #9553


commit cb5892cdd605ec70586c0670ed19d924e0a8eade
Author: Xiu Guo <gu...@us.ibm.com>
Date:   2015-11-08T20:33:07Z

[SPARK-11562][SQL] Provide user an option to init SQLContext or HiveContext 
in spark shell




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11482][SQL] Make maven repo for Hive me...

2015-11-07 Thread xguo27
GitHub user xguo27 opened a pull request:

https://github.com/apache/spark/pull/9543

[SPARK-11482][SQL] Make maven repo for Hive metastore jars configurable

Introducing a property called "spark.sql.hive.maven.repo" to let user 
configure the maven repository to download Hive Metastore jars.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xguo27/spark SPARK-11482

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/9543.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #9543


commit 01352e781ffc62f460c70c865d519531cb336805
Author: Xiu Guo <gu...@us.ibm.com>
Date:   2015-11-07T02:01:04Z

[SPARK-11482][SQL] Make maven repo for Hive metastore jars configurable




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11242][SQL] In conf/spark-env.sh.templa...

2015-10-21 Thread xguo27
GitHub user xguo27 opened a pull request:

https://github.com/apache/spark/pull/9201

[SPARK-11242][SQL] In conf/spark-env.sh.template SPARK_DRIVER_MEMORY is 
documented incorrectly

Minor fix on the comment

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xguo27/spark SPARK-11242

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/9201.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #9201


commit 5a11872efbb3b871fae900bd0228fcbfb25ad0e1
Author: guoxi <gu...@us.ibm.com>
Date:   2015-10-21T18:56:33Z

[SPARK-11242] In conf/spark-env.sh.template SPARK_DRIVER_MEMORY is 
documented incorrectly




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11242][SQL] In conf/spark-env.sh.templa...

2015-10-21 Thread xguo27
Github user xguo27 commented on the pull request:

https://github.com/apache/spark/pull/9201#issuecomment-150054928
  
Right, let me change that too. Thx Sean!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org