[jira] [Commented] (SPARK-18502) Spark does not handle columns that contain backquote (`)
[ https://issues.apache.org/jira/browse/SPARK-18502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643524#comment-17643524 ] Bjørn Jørgensen commented on SPARK-18502: - I just answered this problem in u...@spark.org df = spark.createDataFrame( [("china", "asia"), ("colombia", "south america`")], ["country", "continent`"] ) df.show() ++--+ | country| continent`| ++--+ | china| asia| |colombia|south america`| ++--+ df.select("continent`").show(1) (...)AnalysisException: Syntax error in attribute name: continent`. clean_df = df.toDF(*(c.replace('`', '_') for c in df.columns)) clean_df.show() ++--+ | country| continent_| ++--+ | china| asia| |colombia|south america`| ++--+ clean_df.select("continent_").show(2) +--+ | continent_| +--+ | asia| |south america`| +--+ Examples are from [MungingData Avoiding Dots / Periods in PySpark Column Names|https://mungingdata.com/pyspark/avoid-dots-periods-column-names/] > Spark does not handle columns that contain backquote (`) > > > Key: SPARK-18502 > URL: https://issues.apache.org/jira/browse/SPARK-18502 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Barry Becker >Priority: Minor > Labels: bulk-closed > > I know that if a column contains dots or hyphens we can put > backquotes/backticks around it, but what if the column contains a backtick > (`)? Can the back tick be escaped by some means? > Here is an example of the sort of error I see > {code} > org.apache.spark.sql.AnalysisException: syntax error in attribute name: > `Invoice`Date`;org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.e$1(unresolved.scala:99) > > org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.parseAttributeName(unresolved.scala:109) > > org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.quotedString(unresolved.scala:90) > org.apache.spark.sql.Column.(Column.scala:113) > org.apache.spark.sql.Column$.apply(Column.scala:36) > org.apache.spark.sql.functions$.min(functions.scala:407) > com.mineset.spark.vizagg.vizbin.strategies.DateBinStrategy.getDateExtent(DateBinStrategy.scala:158) > > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18502) Spark does not handle columns that contain backquote (`)
[ https://issues.apache.org/jira/browse/SPARK-18502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17041602#comment-17041602 ] Gan Wei commented on SPARK-18502: - Is there a resolution for this issue. I am also encountering the same issue when selecting a column name containing backtick "`" . {code:java} df.select("a`b`").show(1) {code} got error msg: {code:java} org.apache.spark.sql.AnalysisException: syntax error in attribute name {code} > Spark does not handle columns that contain backquote (`) > > > Key: SPARK-18502 > URL: https://issues.apache.org/jira/browse/SPARK-18502 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Barry Becker >Priority: Minor > Labels: bulk-closed > > I know that if a column contains dots or hyphens we can put > backquotes/backticks around it, but what if the column contains a backtick > (`)? Can the back tick be escaped by some means? > Here is an example of the sort of error I see > {code} > org.apache.spark.sql.AnalysisException: syntax error in attribute name: > `Invoice`Date`;org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.e$1(unresolved.scala:99) > > org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.parseAttributeName(unresolved.scala:109) > > org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.quotedString(unresolved.scala:90) > org.apache.spark.sql.Column.(Column.scala:113) > org.apache.spark.sql.Column$.apply(Column.scala:36) > org.apache.spark.sql.functions$.min(functions.scala:407) > com.mineset.spark.vizagg.vizbin.strategies.DateBinStrategy.getDateExtent(DateBinStrategy.scala:158) > > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18502) Spark does not handle columns that contain backquote (`)
[ https://issues.apache.org/jira/browse/SPARK-18502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16065990#comment-16065990 ] Sudeshna Bora commented on SPARK-18502: --- What is the expected time for resolution of this bug ? Currently, my dataset have columns with backticks as special character. It is failing while trying to initiate such a Column (new Column()) , while using dataset.NumericColumns() and dataset.select() apis. Is there a known work-around for this? > Spark does not handle columns that contain backquote (`) > > > Key: SPARK-18502 > URL: https://issues.apache.org/jira/browse/SPARK-18502 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Barry Becker >Priority: Minor > > I know that if a column contains dots or hyphens we can put > backquotes/backticks around it, but what if the column contains a backtick > (`)? Can the back tick be escaped by some means? > Here is an example of the sort of error I see > {code} > org.apache.spark.sql.AnalysisException: syntax error in attribute name: > `Invoice`Date`;org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.e$1(unresolved.scala:99) > > org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.parseAttributeName(unresolved.scala:109) > > org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.quotedString(unresolved.scala:90) > org.apache.spark.sql.Column.(Column.scala:113) > org.apache.spark.sql.Column$.apply(Column.scala:36) > org.apache.spark.sql.functions$.min(functions.scala:407) > com.mineset.spark.vizagg.vizbin.strategies.DateBinStrategy.getDateExtent(DateBinStrategy.scala:158) > > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18502) Spark does not handle columns that contain backquote (`)
[ https://issues.apache.org/jira/browse/SPARK-18502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15707196#comment-15707196 ] Takeshi Yamamuro commented on SPARK-18502: -- Currently, AFAIK no. However, the SQL standard (http://savage.net.au/SQL/sql-99.bnf.html#delimited%20identifier) specifies a double quotation (") as an escape one and I feel we need a general approach to escape these metacharacters in Spark. Certainly, other databases can use back quotations in column names. ex) PostgreSQL {code} postgres=# create table test_table("i`d" INT, "value" VARCHAR); CREATE TABLE postgres=# \d test_table Table "public.test_table" Column | Type| Modifiers +---+--- i`d| integer | value | character varying | postgres=# insert into test_table values(1, 'aa'); INSERT 0 1 postgres=# select "i`d" from test_table; i`d - 1 (1 row) {code} > Spark does not handle columns that contain backquote (`) > > > Key: SPARK-18502 > URL: https://issues.apache.org/jira/browse/SPARK-18502 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Barry Becker >Priority: Minor > > I know that if a column contains dots or hyphens we can put > backquotes/backticks around it, but what if the column contains a backtick > (`)? Can the back tick be escaped by some means? > Here is an example of the sort of error I see > {code} > org.apache.spark.sql.AnalysisException: syntax error in attribute name: > `Invoice`Date`;org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.e$1(unresolved.scala:99) > > org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.parseAttributeName(unresolved.scala:109) > > org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.quotedString(unresolved.scala:90) > org.apache.spark.sql.Column.(Column.scala:113) > org.apache.spark.sql.Column$.apply(Column.scala:36) > org.apache.spark.sql.functions$.min(functions.scala:407) > com.mineset.spark.vizagg.vizbin.strategies.DateBinStrategy.getDateExtent(DateBinStrategy.scala:158) > > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18502) Spark does not handle columns that contain backquote (`)
[ https://issues.apache.org/jira/browse/SPARK-18502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15706389#comment-15706389 ] Barry Becker commented on SPARK-18502: -- Is there a way to escape the backtick when it appears in a column name? > Spark does not handle columns that contain backquote (`) > > > Key: SPARK-18502 > URL: https://issues.apache.org/jira/browse/SPARK-18502 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Barry Becker >Priority: Minor > > I know that if a column contains dots or hyphens we can put > backquotes/backticks around it, but what if the column contains a backtick > (`)? Can the back tick be escaped by some means? > Here is an example of the sort of error I see > {code} > org.apache.spark.sql.AnalysisException: syntax error in attribute name: > `Invoice`Date`;org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.e$1(unresolved.scala:99) > > org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.parseAttributeName(unresolved.scala:109) > > org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.quotedString(unresolved.scala:90) > org.apache.spark.sql.Column.(Column.scala:113) > org.apache.spark.sql.Column$.apply(Column.scala:36) > org.apache.spark.sql.functions$.min(functions.scala:407) > com.mineset.spark.vizagg.vizbin.strategies.DateBinStrategy.getDateExtent(DateBinStrategy.scala:158) > > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18502) Spark does not handle columns that contain backquote (`)
[ https://issues.apache.org/jira/browse/SPARK-18502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15706385#comment-15706385 ] Kazuaki Ishizaki commented on SPARK-18502: -- I can reproduce this exception using the following program. However, this program is not correct. This is because this comments says a backtick cannot be used inside name part (e.g. {{`Invoice`Date`}}). I think that Spark expects {{`Invoice`.Date}}. Does it make sense? {code} val df = Seq(("11"), ("12")).toDF("`Invoice`Date`") df.select($"`Invoice`Date`").show {code} > Spark does not handle columns that contain backquote (`) > > > Key: SPARK-18502 > URL: https://issues.apache.org/jira/browse/SPARK-18502 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Barry Becker >Priority: Minor > > I know that if a column contains dots or hyphens we can put > backquotes/backticks around it, but what if the column contains a backtick > (`)? Can the back tick be escaped by some means? > Here is an example of the sort of error I see > {code} > org.apache.spark.sql.AnalysisException: syntax error in attribute name: > `Invoice`Date`;org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.e$1(unresolved.scala:99) > > org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.parseAttributeName(unresolved.scala:109) > > org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.quotedString(unresolved.scala:90) > org.apache.spark.sql.Column.(Column.scala:113) > org.apache.spark.sql.Column$.apply(Column.scala:36) > org.apache.spark.sql.functions$.min(functions.scala:407) > com.mineset.spark.vizagg.vizbin.strategies.DateBinStrategy.getDateExtent(DateBinStrategy.scala:158) > > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18502) Spark does not handle columns that contain backquote (`)
[ https://issues.apache.org/jira/browse/SPARK-18502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15697210#comment-15697210 ] Takeshi Yamamuro commented on SPARK-18502: -- Please give us a simple query to reproduce this? I tried a simple query though, the query passed; {code} scala> val df = Seq(("a", 1), ("b", 2), ("c", 1), ("d", 5)).toDF("`k`ey`", "value") df: org.apache.spark.sql.DataFrame = [`k`ey`: string, value: int] scala> df.show +--+-+ |`k`ey`|value| +--+-+ | a|1| | b|2| | c|1| | d|5| +--+-+ {code} > Spark does not handle columns that contain backquote (`) > > > Key: SPARK-18502 > URL: https://issues.apache.org/jira/browse/SPARK-18502 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Barry Becker >Priority: Minor > > I know that if a column contains dots or hyphens we can put > backquotes/backticks around it, but what if the column contains a backtick > (`)? Can the back tick be escaped by some means? > Here is an example of the sort of error I see > {code} > org.apache.spark.sql.AnalysisException: syntax error in attribute name: > `Invoice`Date`;org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.e$1(unresolved.scala:99) > > org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.parseAttributeName(unresolved.scala:109) > > org.apache.spark.sql.catalyst.analysis.UnresolvedAttribute$.quotedString(unresolved.scala:90) > org.apache.spark.sql.Column.(Column.scala:113) > org.apache.spark.sql.Column$.apply(Column.scala:36) > org.apache.spark.sql.functions$.min(functions.scala:407) > com.mineset.spark.vizagg.vizbin.strategies.DateBinStrategy.getDateExtent(DateBinStrategy.scala:158) > > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org