[jira] [Commented] (SPARK-12988) Can't drop columns that contain dots

2016-05-31 Thread Yin Huai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15309135#comment-15309135
 ] 

Yin Huai commented on SPARK-12988:
--

This issue has been resolved by https://github.com/apache/spark/pull/13306.

> Can't drop columns that contain dots
> 
>
> Key: SPARK-12988
> URL: https://issues.apache.org/jira/browse/SPARK-12988
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Michael Armbrust
>Assignee: Sean Zhong
> Fix For: 2.0.0
>
>
> Neither of theses works:
> {code}
> val df = Seq((1, 1)).toDF("a_b", "a.c")
> df.drop("a.c").collect()
> df: org.apache.spark.sql.DataFrame = [a_b: int, a.c: int]
> {code}
> {code}
> val df = Seq((1, 1)).toDF("a_b", "a.c")
> df.drop("`a.c`").collect()
> df: org.apache.spark.sql.DataFrame = [a_b: int, a.c: int]
> {code}
> Given that you can't use drop to drop subfields, it seems to me that we 
> should treat the column name literally (i.e. as though it is wrapped in back 
> ticks).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12988) Can't drop columns that contain dots

2016-05-25 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15300991#comment-15300991
 ] 

Apache Spark commented on SPARK-12988:
--

User 'clockfly' has created a pull request for this issue:
https://github.com/apache/spark/pull/13306

> Can't drop columns that contain dots
> 
>
> Key: SPARK-12988
> URL: https://issues.apache.org/jira/browse/SPARK-12988
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Michael Armbrust
>
> Neither of theses works:
> {code}
> val df = Seq((1, 1)).toDF("a_b", "a.c")
> df.drop("a.c").collect()
> df: org.apache.spark.sql.DataFrame = [a_b: int, a.c: int]
> {code}
> {code}
> val df = Seq((1, 1)).toDF("a_b", "a.c")
> df.drop("`a.c`").collect()
> df: org.apache.spark.sql.DataFrame = [a_b: int, a.c: int]
> {code}
> Given that you can't use drop to drop subfields, it seems to me that we 
> should treat the column name literally (i.e. as though it is wrapped in back 
> ticks).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12988) Can't drop columns that contain dots

2016-05-04 Thread Jorge Machado (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15270350#comment-15270350
 ] 

Jorge Machado commented on SPARK-12988:
---

+1

> Can't drop columns that contain dots
> 
>
> Key: SPARK-12988
> URL: https://issues.apache.org/jira/browse/SPARK-12988
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Michael Armbrust
>
> Neither of theses works:
> {code}
> val df = Seq((1, 1)).toDF("a_b", "a.c")
> df.drop("a.c").collect()
> df: org.apache.spark.sql.DataFrame = [a_b: int, a.c: int]
> {code}
> {code}
> val df = Seq((1, 1)).toDF("a_b", "a.c")
> df.drop("`a.c`").collect()
> df: org.apache.spark.sql.DataFrame = [a_b: int, a.c: int]
> {code}
> Given that you can't use drop to drop subfields, it seems to me that we 
> should treat the column name literally (i.e. as though it is wrapped in back 
> ticks).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12988) Can't drop columns that contain dots

2016-02-02 Thread Wenchen Fan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15128745#comment-15128745
 ] 

Wenchen Fan commented on SPARK-12988:
-

I'd also like to forbid to use invalid column names in `drop`

> Can't drop columns that contain dots
> 
>
> Key: SPARK-12988
> URL: https://issues.apache.org/jira/browse/SPARK-12988
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Michael Armbrust
>
> Neither of theses works:
> {code}
> val df = Seq((1, 1)).toDF("a_b", "a.c")
> df.drop("a.c").collect()
> df: org.apache.spark.sql.DataFrame = [a_b: int, a.c: int]
> {code}
> {code}
> val df = Seq((1, 1)).toDF("a_b", "a.c")
> df.drop("`a.c`").collect()
> df: org.apache.spark.sql.DataFrame = [a_b: int, a.c: int]
> {code}
> Given that you can't use drop to drop subfields, it seems to me that we 
> should treat the column name literally (i.e. as though it is wrapped in back 
> ticks).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12988) Can't drop columns that contain dots

2016-02-02 Thread Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15128829#comment-15128829
 ] 

Yan commented on SPARK-12988:
-

My thinking is that projections should parse the column names; while the 
schema-based ops should keep the names as is. One thing I'm not sure is 
"Column". Given its current capabilities, it seems it is for projections so its 
name should be backticked if it contains a '.'. But please correct me if I'm 
wrong here.

> Can't drop columns that contain dots
> 
>
> Key: SPARK-12988
> URL: https://issues.apache.org/jira/browse/SPARK-12988
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Michael Armbrust
>
> Neither of theses works:
> {code}
> val df = Seq((1, 1)).toDF("a_b", "a.c")
> df.drop("a.c").collect()
> df: org.apache.spark.sql.DataFrame = [a_b: int, a.c: int]
> {code}
> {code}
> val df = Seq((1, 1)).toDF("a_b", "a.c")
> df.drop("`a.c`").collect()
> df: org.apache.spark.sql.DataFrame = [a_b: int, a.c: int]
> {code}
> Given that you can't use drop to drop subfields, it seems to me that we 
> should treat the column name literally (i.e. as though it is wrapped in back 
> ticks).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12988) Can't drop columns that contain dots

2016-02-02 Thread Dilip Biswal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15128902#comment-15128902
 ] 

Dilip Biswal commented on SPARK-12988:
--

The shuttle difference between column path and column name may not be very 
obvious to a common user of this API. 

val df = Seq((1, 1)).toDF("a_b", "a.b")
df.select("`a.b`")
df.drop("`a.b`") => the fact that one can not use back tick here , would it be 
that obvious to the user ?

I believe that was the motivation to allow it but then i am not sure of its 
implications.

> Can't drop columns that contain dots
> 
>
> Key: SPARK-12988
> URL: https://issues.apache.org/jira/browse/SPARK-12988
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Michael Armbrust
>
> Neither of theses works:
> {code}
> val df = Seq((1, 1)).toDF("a_b", "a.c")
> df.drop("a.c").collect()
> df: org.apache.spark.sql.DataFrame = [a_b: int, a.c: int]
> {code}
> {code}
> val df = Seq((1, 1)).toDF("a_b", "a.c")
> df.drop("`a.c`").collect()
> df: org.apache.spark.sql.DataFrame = [a_b: int, a.c: int]
> {code}
> Given that you can't use drop to drop subfields, it seems to me that we 
> should treat the column name literally (i.e. as though it is wrapped in back 
> ticks).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12988) Can't drop columns that contain dots

2016-02-02 Thread Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127883#comment-15127883
 ] 

Yan commented on SPARK-12988:
-

[~marmbrus] For the same reason of "`a.c` is an invalid column name. toDF(...) 
should not accept that",  can we require that df.drop do not take backtick 
either because df.drop can only drop top-level columns? Programmatically it 
makes little difference; but it seems more consistent semantically. 

> Can't drop columns that contain dots
> 
>
> Key: SPARK-12988
> URL: https://issues.apache.org/jira/browse/SPARK-12988
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Michael Armbrust
>
> Neither of theses works:
> {code}
> val df = Seq((1, 1)).toDF("a_b", "a.c")
> df.drop("a.c").collect()
> df: org.apache.spark.sql.DataFrame = [a_b: int, a.c: int]
> {code}
> {code}
> val df = Seq((1, 1)).toDF("a_b", "a.c")
> df.drop("`a.c`").collect()
> df: org.apache.spark.sql.DataFrame = [a_b: int, a.c: int]
> {code}
> Given that you can't use drop to drop subfields, it seems to me that we 
> should treat the column name literally (i.e. as though it is wrapped in back 
> ticks).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12988) Can't drop columns that contain dots

2016-01-26 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15118772#comment-15118772
 ] 

Apache Spark commented on SPARK-12988:
--

User 'dilipbiswal' has created a pull request for this issue:
https://github.com/apache/spark/pull/10943

> Can't drop columns that contain dots
> 
>
> Key: SPARK-12988
> URL: https://issues.apache.org/jira/browse/SPARK-12988
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Michael Armbrust
>
> Neither of theses works:
> {code}
> val df = Seq((1, 1)).toDF("a_b", "a.c")
> df.drop("a.c").collect()
> df: org.apache.spark.sql.DataFrame = [a_b: int, a.c: int]
> {code}
> {code}
> val df = Seq((1, 1)).toDF("a_b", "a.c")
> df.drop("`a.c`").collect()
> df: org.apache.spark.sql.DataFrame = [a_b: int, a.c: int]
> {code}
> Given that you can't use drop to drop subfields, it seems to me that we 
> should treat the column name literally (i.e. as though it is wrapped in back 
> ticks).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12988) Can't drop columns that contain dots

2016-01-26 Thread Dilip Biswal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15118775#comment-15118775
 ] 

Dilip Biswal commented on SPARK-12988:
--

[~marmbrus][~rxin] Thanks for your input.

> Can't drop columns that contain dots
> 
>
> Key: SPARK-12988
> URL: https://issues.apache.org/jira/browse/SPARK-12988
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Michael Armbrust
>
> Neither of theses works:
> {code}
> val df = Seq((1, 1)).toDF("a_b", "a.c")
> df.drop("a.c").collect()
> df: org.apache.spark.sql.DataFrame = [a_b: int, a.c: int]
> {code}
> {code}
> val df = Seq((1, 1)).toDF("a_b", "a.c")
> df.drop("`a.c`").collect()
> df: org.apache.spark.sql.DataFrame = [a_b: int, a.c: int]
> {code}
> Given that you can't use drop to drop subfields, it seems to me that we 
> should treat the column name literally (i.e. as though it is wrapped in back 
> ticks).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12988) Can't drop columns that contain dots

2016-01-26 Thread Dilip Biswal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15117509#comment-15117509
 ] 

Dilip Biswal commented on SPARK-12988:
--

I would like to work on this one.

> Can't drop columns that contain dots
> 
>
> Key: SPARK-12988
> URL: https://issues.apache.org/jira/browse/SPARK-12988
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Michael Armbrust
>
> Neither of theses works:
> {code}
> val df = Seq((1, 1)).toDF("a_b", "a.c")
> df.drop("a.c").collect()
> df: org.apache.spark.sql.DataFrame = [a_b: int, a.c: int]
> {code}
> {code}
> val df = Seq((1, 1)).toDF("a_b", "a.c")
> df.drop("`a.c`").collect()
> df: org.apache.spark.sql.DataFrame = [a_b: int, a.c: int]
> {code}
> Given that you can't use drop to drop subfields, it seems to me that we 
> should treat the column name literally (i.e. as though it is wrapped in back 
> ticks).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12988) Can't drop columns that contain dots

2016-01-26 Thread Dilip Biswal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15118319#comment-15118319
 ] 

Dilip Biswal commented on SPARK-12988:
--

[~marmbrus] Hi Michael, need your input on the semantics.

Say we have a dataframe defined like following :
val df = Seq((1, 1,1,1,1,1)).toDF("a_b", "a.c", "`a.c`")

df.drop("a.c")  => Should we remove the 2nd column here ? 
df.drop("`a.c`") => Should we remove the 3rd column here ?

Regards,
-- Dilip

> Can't drop columns that contain dots
> 
>
> Key: SPARK-12988
> URL: https://issues.apache.org/jira/browse/SPARK-12988
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Michael Armbrust
>
> Neither of theses works:
> {code}
> val df = Seq((1, 1)).toDF("a_b", "a.c")
> df.drop("a.c").collect()
> df: org.apache.spark.sql.DataFrame = [a_b: int, a.c: int]
> {code}
> {code}
> val df = Seq((1, 1)).toDF("a_b", "a.c")
> df.drop("`a.c`").collect()
> df: org.apache.spark.sql.DataFrame = [a_b: int, a.c: int]
> {code}
> Given that you can't use drop to drop subfields, it seems to me that we 
> should treat the column name literally (i.e. as though it is wrapped in back 
> ticks).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12988) Can't drop columns that contain dots

2016-01-26 Thread Michael Armbrust (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15118433#comment-15118433
 ] 

Michael Armbrust commented on SPARK-12988:
--

Here are my thoughts after discussing with [~rxin]:

 - {{`a.c`}} is an invalid column name. {{toDF(...)}} should not accept that 
(this can be fixed in another JIRA).
 - {{df.drop(...)}} can only be used to drop top level columns. So, there is no 
reason to ever interpret the dots.  Thus {{df.drop("a.c")}} should interpret 
the name as though it has {{``}} and  drop the second column.
 - {{df.drop("`a.c`")}} should probably also work (i.e. just strip the ``) this 
seems like the least surprise since then it works like other APIs.

> Can't drop columns that contain dots
> 
>
> Key: SPARK-12988
> URL: https://issues.apache.org/jira/browse/SPARK-12988
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.6.0
>Reporter: Michael Armbrust
>
> Neither of theses works:
> {code}
> val df = Seq((1, 1)).toDF("a_b", "a.c")
> df.drop("a.c").collect()
> df: org.apache.spark.sql.DataFrame = [a_b: int, a.c: int]
> {code}
> {code}
> val df = Seq((1, 1)).toDF("a_b", "a.c")
> df.drop("`a.c`").collect()
> df: org.apache.spark.sql.DataFrame = [a_b: int, a.c: int]
> {code}
> Given that you can't use drop to drop subfields, it seems to me that we 
> should treat the column name literally (i.e. as though it is wrapped in back 
> ticks).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org