[jira] [Commented] (SPARK-26911) Spark do not see column in table

2019-02-18 Thread Vitaly Larchenkov (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16771633#comment-16771633
 ] 

Vitaly Larchenkov commented on SPARK-26911:
---

Yeah, will do that in few days.

> Spark do not see column in table
> 
>
> Key: SPARK-26911
> URL: https://issues.apache.org/jira/browse/SPARK-26911
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.1
> Environment: PySpark (Spark 2.3.1)
>Reporter: Vitaly Larchenkov
>Priority: Major
>
>  
>  
> Spark cannot find column that actually exists in array
> {code:java}
> org.apache.spark.sql.AnalysisException: cannot resolve '`id`' given input 
> columns: [flid.palfl_timestamp, flid.id, flid.pal_state, flid.prs_id, 
> flid.bank_id, flid.wr_id, flid.link_id]; {code}
>  
>  
> {code:java}
> ---
> Py4JJavaError Traceback (most recent call last)
> /usr/share/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
>  62 try:
> ---> 63 return f(*a, **kw)
>  64 except py4j.protocol.Py4JJavaError as e:
> /usr/share/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py in 
> get_return_value(answer, gateway_client, target_id, name)
> 327 "An error occurred while calling {0}{1}{2}.\n".
> --> 328 format(target_id, ".", name), value)
> 329 else:
> Py4JJavaError: An error occurred while calling o35.sql.
> : org.apache.spark.sql.AnalysisException: cannot resolve '`id`' given input 
> columns: [flid.palfl_timestamp, flid.id, flid.pal_state, flid.prs_id, 
> flid.bank_id, flid.wr_id, flid.link_id]; line 10 pos 98;
> 'Project ['multiples.id, 'multiples.link_id]
> {code}
>  
> Query:
> {code:java}
> q = f"""
> with flid as (
> select * from flow_log_by_id
> )
> select multiples.id, multiples.link_id
> from (select fl.id, fl.link_id
> from (select id from {flow_log_by_id} group by id having count(*) > 1) 
> multiples
> join {flow_log_by_id} fl on fl.id = multiples.id) multiples
> join {level_link} ll
> on multiples.link_id = ll.link_id_old and ll.link_id_new in (select link_id 
> from flid where id = multiples.id)
> """
> flow_subset_test_result = spark.sql(q)
> {code}
>  `with flid` used because without it spark do not find `flow_log_by_id` 
> table, so looks like another issues. In sql it works without problems.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26911) Spark do not see column in table

2019-02-18 Thread Hyukjin Kwon (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16771628#comment-16771628
 ] 

Hyukjin Kwon commented on SPARK-26911:
--

Can you make the reproducer self-runnable and narrow down the problem? Sounds 
like requesting investigation than filing an issue. I am resolving this until 
sufficient information are provided for other people to investigate further. If 
you have a fix, reopen and make a PR right away.

> Spark do not see column in table
> 
>
> Key: SPARK-26911
> URL: https://issues.apache.org/jira/browse/SPARK-26911
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.1
> Environment: PySpark (Spark 2.3.1)
>Reporter: Vitaly Larchenkov
>Priority: Major
>
>  
>  
> Spark cannot find column that actually exists in array
> {code:java}
> org.apache.spark.sql.AnalysisException: cannot resolve '`id`' given input 
> columns: [flid.palfl_timestamp, flid.id, flid.pal_state, flid.prs_id, 
> flid.bank_id, flid.wr_id, flid.link_id]; {code}
>  
>  
> {code:java}
> ---
> Py4JJavaError Traceback (most recent call last)
> /usr/share/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
>  62 try:
> ---> 63 return f(*a, **kw)
>  64 except py4j.protocol.Py4JJavaError as e:
> /usr/share/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py in 
> get_return_value(answer, gateway_client, target_id, name)
> 327 "An error occurred while calling {0}{1}{2}.\n".
> --> 328 format(target_id, ".", name), value)
> 329 else:
> Py4JJavaError: An error occurred while calling o35.sql.
> : org.apache.spark.sql.AnalysisException: cannot resolve '`id`' given input 
> columns: [flid.palfl_timestamp, flid.id, flid.pal_state, flid.prs_id, 
> flid.bank_id, flid.wr_id, flid.link_id]; line 10 pos 98;
> 'Project ['multiples.id, 'multiples.link_id]
> {code}
>  
> Query:
> {code:java}
> q = f"""
> with flid as (
> select * from flow_log_by_id
> )
> select multiples.id, multiples.link_id
> from (select fl.id, fl.link_id
> from (select id from {flow_log_by_id} group by id having count(*) > 1) 
> multiples
> join {flow_log_by_id} fl on fl.id = multiples.id) multiples
> join {level_link} ll
> on multiples.link_id = ll.link_id_old and ll.link_id_new in (select link_id 
> from flid where id = multiples.id)
> """
> flow_subset_test_result = spark.sql(q)
> {code}
>  `with flid` used because without it spark do not find `flow_log_by_id` 
> table, so looks like another issues. In sql it works without problems.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26911) Spark do not see column in table

2019-02-18 Thread Marco Gaido (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16770964#comment-16770964
 ] 

Marco Gaido commented on SPARK-26911:
-

May you please check that current master is still affected? Moreover, can you 
provide a reproducer? Otherwise it is impossible to investigate the issue. 
Thanks.

> Spark do not see column in table
> 
>
> Key: SPARK-26911
> URL: https://issues.apache.org/jira/browse/SPARK-26911
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.1
> Environment: PySpark (Spark 2.3.1)
>Reporter: Vitaly Larchenkov
>Priority: Major
>
> Spark cannot find column that actually exists in array
> {code:java}
> org.apache.spark.sql.AnalysisException: cannot resolve '`id`' given input 
> columns: [flid.palfl_timestamp, flid.id, flid.pal_state, flid.prs_id, 
> flid.bank_id, flid.wr_id, flid.link_id]; {code}
>  
>  
> {code:java}
> ---
> Py4JJavaError Traceback (most recent call last)
> /usr/share/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
>  62 try:
> ---> 63 return f(*a, **kw)
>  64 except py4j.protocol.Py4JJavaError as e:
> /usr/share/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py in 
> get_return_value(answer, gateway_client, target_id, name)
> 327 "An error occurred while calling {0}{1}{2}.\n".
> --> 328 format(target_id, ".", name), value)
> 329 else:
> Py4JJavaError: An error occurred while calling o35.sql.
> : org.apache.spark.sql.AnalysisException: cannot resolve '`id`' given input 
> columns: [flid.palfl_timestamp, flid.id, flid.pal_state, flid.prs_id, 
> flid.bank_id, flid.wr_id, flid.link_id]; line 10 pos 98;
> 'Project ['multiples.id, 'multiples.link_id]{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org