Github user gczsjdy commented on the issue:
https://github.com/apache/spark/pull/18758
@HyukjinKwon You are right. I will do my best. : )
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user gczsjdy commented on the issue:
https://github.com/apache/spark/pull/18758
@HyukjinKwon I can try to find other typos. But actually it's hard to find
it until you use a certain API.
---
If your project is set up for it, you can reply to this email and have your
reply
GitHub user gczsjdy opened a pull request:
https://github.com/apache/spark/pull/18758
Fix typo in DataframeWriter doc
## What changes were proposed in this pull request?
The format of `none` should be consistent with other compression codec.
## How was this patch
Github user gczsjdy closed the pull request at:
https://github.com/apache/spark/pull/18632
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user gczsjdy commented on the issue:
https://github.com/apache/spark/pull/18632
@cloud-fan You are right, thanks. I will close this PR.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user gczsjdy commented on a diff in the pull request:
https://github.com/apache/spark/pull/18632#discussion_r127908258
--- Diff:
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java
---
@@ -51,6 +51,7 @@ public UnsafeRowWriter
Github user gczsjdy commented on a diff in the pull request:
https://github.com/apache/spark/pull/18632#discussion_r127907518
--- Diff:
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeRowWriter.java
---
@@ -51,6 +51,7 @@ public UnsafeRowWriter
Github user gczsjdy commented on the issue:
https://github.com/apache/spark/pull/18632
@cloud-fan @viirya @gatorsmile Could you please help me review this?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project
GitHub user gczsjdy opened a pull request:
https://github.com/apache/spark/pull/18632
Reset BufferHolder while initialize an UnsafeRowWriter
## What changes were proposed in this pull request?
`UnsafeRowWriter`'s construtor should contain `BufferHolder.reset` to make
Github user gczsjdy commented on a diff in the pull request:
https://github.com/apache/spark/pull/18284#discussion_r122123445
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/trees/TreeNodeSuite.scala
---
@@ -146,6 +154,17 @@ class TreeNodeSuite extends
Github user gczsjdy commented on the issue:
https://github.com/apache/spark/pull/17359
@viirya Updated, as the last sentence mentioned, I have tried to make Spark
support `GenericUDAFResolver`. But it lacks some interfaces comparing with
`AbstractGenericUDAFResolver` so this can't
Github user gczsjdy commented on the issue:
https://github.com/apache/spark/pull/17359
@viirya Sorry for the late reply. It seems Spark cannot use Hive
`GenericUDAFnGrams`, since Spark only supports subclass of
`AbstractGenericUDAFResolver` & `UDAF` for Hive UDAF, w
Github user gczsjdy commented on a diff in the pull request:
https://github.com/apache/spark/pull/17359#discussion_r110823102
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/NGrams.scala
---
@@ -0,0 +1,249 @@
+/*
+ * Licensed
Github user gczsjdy commented on a diff in the pull request:
https://github.com/apache/spark/pull/17359#discussion_r110820889
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/NGrams.scala
---
@@ -0,0 +1,249 @@
+/*
+ * Licensed
Github user gczsjdy commented on the issue:
https://github.com/apache/spark/pull/17359
@rxin @cloud-fan @gatorsmile @viirya @tejasapatil Could you please help me
review this PR? Or is there anything I can do on this work?
---
If your project is set up for it, you can reply
Github user gczsjdy commented on the issue:
https://github.com/apache/spark/pull/16476
@cloud-fan Do you have comment on this version?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user gczsjdy commented on the issue:
https://github.com/apache/spark/pull/17359
@rxin My fault, the example I gave is far from practical use and I have
updated. Actually, we can use it whenever the text analysis is reasonable to be
based on frequencies of word sequences
Github user gczsjdy commented on a diff in the pull request:
https://github.com/apache/spark/pull/17359#discussion_r10755
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/NGrams.scala
---
@@ -0,0 +1,258 @@
+/*
+ * Licensed
Github user gczsjdy commented on a diff in the pull request:
https://github.com/apache/spark/pull/17359#discussion_r107429936
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/NGrams.scala
---
@@ -0,0 +1,258 @@
+/*
+ * Licensed
Github user gczsjdy commented on a diff in the pull request:
https://github.com/apache/spark/pull/17359#discussion_r107429714
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/NGrams.scala
---
@@ -0,0 +1,258 @@
+/*
+ * Licensed
Github user gczsjdy commented on a diff in the pull request:
https://github.com/apache/spark/pull/17359#discussion_r107428643
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/NGrams.scala
---
@@ -0,0 +1,258 @@
+/*
+ * Licensed
Github user gczsjdy commented on a diff in the pull request:
https://github.com/apache/spark/pull/17359#discussion_r107418753
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/NGrams.scala
---
@@ -0,0 +1,258 @@
+/*
+ * Licensed
Github user gczsjdy commented on a diff in the pull request:
https://github.com/apache/spark/pull/17359#discussion_r107417310
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/NGrams.scala
---
@@ -0,0 +1,258 @@
+/*
+ * Licensed
Github user gczsjdy commented on a diff in the pull request:
https://github.com/apache/spark/pull/17359#discussion_r107415412
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/NGrams.scala
---
@@ -0,0 +1,258 @@
+/*
+ * Licensed
Github user gczsjdy commented on a diff in the pull request:
https://github.com/apache/spark/pull/17359#discussion_r107347663
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/NGrams.scala
---
@@ -0,0 +1,258 @@
+/*
+ * Licensed
Github user gczsjdy commented on a diff in the pull request:
https://github.com/apache/spark/pull/17359#discussion_r107345616
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/NGrams.scala
---
@@ -0,0 +1,258 @@
+/*
+ * Licensed
Github user gczsjdy commented on the issue:
https://github.com/apache/spark/pull/17359
cc @chenghao-intel @yucai @adrian-wang @cloud-fan @gatorsmile
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
GitHub user gczsjdy opened a pull request:
https://github.com/apache/spark/pull/17359
Add aggreagate expression nGrams
## What changes were proposed in this pull request?
This is the implementation of Hive's `ngrams`, which is a popular
statistical and data mining function
Github user gczsjdy commented on the issue:
https://github.com/apache/spark/pull/16476
@cloud-fan I have submitted a new version to support implicit cast.
We determine the implicit cast type in analysis stage, so maybe we won't do
`eval` then, so we can't be 100% intelligent
Github user gczsjdy commented on a diff in the pull request:
https://github.com/apache/spark/pull/16476#discussion_r106334932
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
---
@@ -340,3 +343,105 @@ object CaseKeyWhen
Github user gczsjdy commented on a diff in the pull request:
https://github.com/apache/spark/pull/16476#discussion_r106334083
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
---
@@ -340,3 +343,105 @@ object CaseKeyWhen
Github user gczsjdy commented on a diff in the pull request:
https://github.com/apache/spark/pull/16476#discussion_r106157856
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
---
@@ -340,3 +343,105 @@ object CaseKeyWhen
Github user gczsjdy commented on the issue:
https://github.com/apache/spark/pull/16476
@viirya I have tested with Hive 2.1.1, `select field(2, '2', 2);` will
return 2 while it will return 1 in MySQL, I also read Hive's code, where I
found it will first compare the data type which
Github user gczsjdy commented on the issue:
https://github.com/apache/spark/pull/16476
@cloud-fan I have checked some other popular RDBMS, such as Oracle,
Microsoft SQL Server, PostgreSQL, DB2, SQLite, neither of them support function
`FIELD`.
In my opinion, Hive has an important
Github user gczsjdy commented on the issue:
https://github.com/apache/spark/pull/16476
@gatorsmile Do you think we can merge this PR? Or is there something that
need to modify?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user gczsjdy commented on the issue:
https://github.com/apache/spark/pull/16476
@tejasapatil Got it, thanks for your review. :+1:
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user gczsjdy commented on the issue:
https://github.com/apache/spark/pull/16476
@tejasapatil Sorry to bother, is there still something need to modify?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your
Github user gczsjdy commented on a diff in the pull request:
https://github.com/apache/spark/pull/16476#discussion_r102671021
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
---
@@ -340,3 +343,105 @@ object CaseKeyWhen
Github user gczsjdy commented on the issue:
https://github.com/apache/spark/pull/16476
@cloud-fan Could you please help me review this PR?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user gczsjdy commented on a diff in the pull request:
https://github.com/apache/spark/pull/16476#discussion_r101770055
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
---
@@ -340,3 +341,91 @@ object CaseKeyWhen
Github user gczsjdy commented on a diff in the pull request:
https://github.com/apache/spark/pull/16476#discussion_r101766864
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
---
@@ -340,3 +343,104 @@ object CaseKeyWhen
Github user gczsjdy commented on a diff in the pull request:
https://github.com/apache/spark/pull/16476#discussion_r101673768
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
---
@@ -340,3 +341,91 @@ object CaseKeyWhen
Github user gczsjdy commented on a diff in the pull request:
https://github.com/apache/spark/pull/16476#discussion_r101505199
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
---
@@ -340,3 +341,91 @@ object CaseKeyWhen
Github user gczsjdy commented on a diff in the pull request:
https://github.com/apache/spark/pull/16476#discussion_r101468470
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
---
@@ -340,3 +343,104 @@ object CaseKeyWhen
Github user gczsjdy commented on a diff in the pull request:
https://github.com/apache/spark/pull/16476#discussion_r101466954
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
---
@@ -340,3 +344,96 @@ object CaseKeyWhen
Github user gczsjdy commented on the issue:
https://github.com/apache/spark/pull/16476
@gatorsmile Hi, this patch has passed all tests, is there some code I
still need to modify? Thank you for working on this.
---
If your project is set up for it, you can reply to this email
Github user gczsjdy commented on a diff in the pull request:
https://github.com/apache/spark/pull/16476#discussion_r99753353
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
---
@@ -20,8 +20,12 @@ package
Github user gczsjdy commented on a diff in the pull request:
https://github.com/apache/spark/pull/16476#discussion_r99535960
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
---
@@ -340,3 +344,99 @@ object CaseKeyWhen
Github user gczsjdy commented on the issue:
https://github.com/apache/spark/pull/16476
ping @rxin @gatorsmile @tejasapatil @viirya
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have
Github user gczsjdy commented on the issue:
https://github.com/apache/spark/pull/16476
Thanks @viirya for the review and sorry for the late reply.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user gczsjdy commented on a diff in the pull request:
https://github.com/apache/spark/pull/16476#discussion_r96348295
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
---
@@ -340,3 +344,96 @@ object CaseKeyWhen
Github user gczsjdy commented on a diff in the pull request:
https://github.com/apache/spark/pull/16476#discussion_r96347397
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
---
@@ -340,3 +344,96 @@ object CaseKeyWhen
Github user gczsjdy commented on a diff in the pull request:
https://github.com/apache/spark/pull/16476#discussion_r96347281
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala ---
@@ -17,11 +17,13 @@
package org.apache.spark.sql
Github user gczsjdy commented on a diff in the pull request:
https://github.com/apache/spark/pull/16476#discussion_r96347166
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
---
@@ -340,3 +344,96 @@ object CaseKeyWhen
Github user gczsjdy closed the pull request at:
https://github.com/apache/spark/pull/16559
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature
Github user gczsjdy commented on the issue:
https://github.com/apache/spark/pull/16559
Thanks for your inform @rxin @aray @cloud-fan , I will close this PR.
Sorry for the late reply.
---
If your project is set up for it, you can reply to this email and have your
reply appear
Github user gczsjdy commented on the issue:
https://github.com/apache/spark/pull/16559
cc @chenghao-intel @adrian-wang
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
GitHub user gczsjdy opened a pull request:
https://github.com/apache/spark/pull/16559
Add expression index and test cases
## What changes were proposed in this pull request?
Expression Index returns the value of index n (right) of the array/map
(left).
To be consistent
Github user gczsjdy commented on a diff in the pull request:
https://github.com/apache/spark/pull/16476#discussion_r95732577
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
---
@@ -340,3 +344,96 @@ object CaseKeyWhen
Github user gczsjdy commented on the issue:
https://github.com/apache/spark/pull/16476
@chenghao-intel
I think that the optimize rule will fold the parameters that have different
types with param0, and then disorganize the parameters' index. Thanks.
@rxin
I have removed
Github user gczsjdy commented on a diff in the pull request:
https://github.com/apache/spark/pull/16476#discussion_r95294460
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
---
@@ -340,3 +344,102 @@ object CaseKeyWhen
Github user gczsjdy commented on a diff in the pull request:
https://github.com/apache/spark/pull/16476#discussion_r95294259
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
---
@@ -340,3 +344,102 @@ object CaseKeyWhen
Github user gczsjdy commented on a diff in the pull request:
https://github.com/apache/spark/pull/16476#discussion_r95292505
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
---
@@ -340,3 +344,102 @@ object CaseKeyWhen
Github user gczsjdy commented on a diff in the pull request:
https://github.com/apache/spark/pull/16476#discussion_r95292481
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
---
@@ -340,3 +344,102 @@ object CaseKeyWhen
Github user gczsjdy commented on a diff in the pull request:
https://github.com/apache/spark/pull/16476#discussion_r95102912
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
---
@@ -340,3 +341,91 @@ object CaseKeyWhen
Github user gczsjdy commented on a diff in the pull request:
https://github.com/apache/spark/pull/16476#discussion_r95102842
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
---
@@ -340,3 +341,91 @@ object CaseKeyWhen
Github user gczsjdy commented on a diff in the pull request:
https://github.com/apache/spark/pull/16476#discussion_r95080770
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
---
@@ -340,3 +341,91 @@ object CaseKeyWhen
Github user gczsjdy commented on a diff in the pull request:
https://github.com/apache/spark/pull/16476#discussion_r95080738
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
---
@@ -340,3 +341,91 @@ object CaseKeyWhen
Github user gczsjdy commented on a diff in the pull request:
https://github.com/apache/spark/pull/16476#discussion_r95080333
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
---
@@ -340,3 +341,91 @@ object CaseKeyWhen
Github user gczsjdy commented on a diff in the pull request:
https://github.com/apache/spark/pull/16476#discussion_r95080295
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
---
@@ -340,3 +341,91 @@ object CaseKeyWhen
Github user gczsjdy commented on a diff in the pull request:
https://github.com/apache/spark/pull/16476#discussion_r95079031
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala
---
@@ -137,4 +139,48 @@ class
Github user gczsjdy commented on a diff in the pull request:
https://github.com/apache/spark/pull/16476#discussion_r95078909
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -1528,6 +1528,18 @@ object functions {
def factorial(e: Column): Column
Github user gczsjdy commented on a diff in the pull request:
https://github.com/apache/spark/pull/16476#discussion_r95060664
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
---
@@ -340,3 +341,91 @@ object CaseKeyWhen
Github user gczsjdy commented on a diff in the pull request:
https://github.com/apache/spark/pull/16476#discussion_r95060633
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionalExpressions.scala
---
@@ -340,3 +341,91 @@ object CaseKeyWhen
Github user gczsjdy commented on a diff in the pull request:
https://github.com/apache/spark/pull/16476#discussion_r95060564
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala
---
@@ -137,4 +139,48 @@ class
Github user gczsjdy commented on a diff in the pull request:
https://github.com/apache/spark/pull/16476#discussion_r95060448
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala
---
@@ -137,4 +139,48 @@ class
Github user gczsjdy commented on a diff in the pull request:
https://github.com/apache/spark/pull/16476#discussion_r95060436
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala
---
@@ -137,4 +139,48 @@ class
Github user gczsjdy commented on a diff in the pull request:
https://github.com/apache/spark/pull/16476#discussion_r95060369
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala
---
@@ -137,4 +139,48 @@ class
Github user gczsjdy commented on a diff in the pull request:
https://github.com/apache/spark/pull/16476#discussion_r95060307
--- Diff:
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala
---
@@ -137,4 +139,48 @@ class
Github user gczsjdy commented on the issue:
https://github.com/apache/spark/pull/16476
cc @chenghao-intel @adrian-wang
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled
GitHub user gczsjdy opened a pull request:
https://github.com/apache/spark/pull/16476
Implement expression field
This is an implementation of Hive's `field`.
field(str, str1, str2, ... ) is a variable-length(>=2) function which
returns the index of str in the list (s
101 - 181 of 181 matches
Mail list logo