[ https://issues.apache.org/jira/browse/SPARK-13572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Marcelo Vanzin updated SPARK-13572: ----------------------------------- Comment: was deleted (was: Guten Tag, Bonjour, Buongiorno Besten Dank für Ihr E-Mail. Ich bin zurzeit nicht am Arbeitsplatz und kehre am 27.06.2016 zurück. Ihr E-Mail wird nicht weitergeleitet. Merci pour votre courriel. Je suis actuellement absent(e) et serai de retour le 27.06.2016. Votre courriel ne sera pas dévié. La ringrazio per la sua e-mail. Al momento sono assente. Sarò di ritorno il 27.06.2016. Il suo messaggio non è inoltrato a terze persone. Freundliche Grüsse Avec mes meilleures salutations Cordiali saluti Russ Weedon Professional ERP Engineer SBB AG SBB Informatik Solution Center Finanzen, K-SCM, HR und Immobilien Lindenhofstrasse 1, 3000 Bern 65 Mobil +41 78 812 47 62 russ.wee...@sbb.ch / www.sbb.ch ) > HiveContext reads avro Hive tables incorrectly > ----------------------------------------------- > > Key: SPARK-13572 > URL: https://issues.apache.org/jira/browse/SPARK-13572 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 1.5.2, 1.6.0, 1.6.1 > Environment: Hive 0.13.1, Spark 1.5.2 > Reporter: Zoltan Fedor > Priority: Major > Attachments: logs, table_definition > > > I am using PySpark to read avro-based tables from Hive and while the avro > tables can be read, some of the columns are incorrectly read - showing value > {{None}} instead of the actual value. > {noformat} > >>> results_df = sqlContext.sql("""SELECT * FROM trmdw_prod.opsconsole_ingest > >>> where year=2016 and month=2 and day=29 limit 3""") > >>> results_df.take(3) > [Row(kafkaoffsetgeneration=None, kafkapartition=None, kafkaoffset=None, > uuid=None, mid=None, iid=None, product=None, utctime=None, statcode=None, > statvalue=None, displayname=None, category=None, > source_filename=u'ops-20160228_23_35_01.gz', year=2016, month=2, day=29), > Row(kafkaoffsetgeneration=None, kafkapartition=None, kafkaoffset=None, > uuid=None, mid=None, iid=None, product=None, utctime=None, statcode=None, > statvalue=None, displayname=None, category=None, > source_filename=u'ops-20160228_23_35_01.gz', year=2016, month=2, day=29), > Row(kafkaoffsetgeneration=None, kafkapartition=None, kafkaoffset=None, > uuid=None, mid=None, iid=None, product=None, utctime=None, statcode=None, > statvalue=None, displayname=None, category=None, > source_filename=u'ops-20160228_23_35_01.gz', year=2016, month=2, day=29)] > {noformat} > Observe the {{None}} values at most of the fields. Surprisingly not all > fields, only some of them are showing {{None}} instead of the real values. > The table definition does not show anything specific about these columns. > Running the same query in Hive: > {noformat} > c:hive2://xyz.com:100> SELECT * FROM trmdw_prod.opsconsole_ingest where > year=2016 and month=2 and day=29 limit 3; > +------------------------------------------+-----------------------------------+--------------------------------+-----------------------------------+---------------------------------------+---------------------------------------+----------------------------+----------------------------+-----------------------------+------------------------------+--------------------------------+-----------------------------+------------------------------------+-------------------------+--------------------------+------------------------+--+ > | opsconsole_ingest.kafkaoffsetgeneration | opsconsole_ingest.kafkapartition > | opsconsole_ingest.kafkaoffset | opsconsole_ingest.uuid | > opsconsole_ingest.mid | opsconsole_ingest.iid | > opsconsole_ingest.product | opsconsole_ingest.utctime | > opsconsole_ingest.statcode | opsconsole_ingest.statvalue | > opsconsole_ingest.displayname | opsconsole_ingest.category | > opsconsole_ingest.source_filename | opsconsole_ingest.year | > opsconsole_ingest.month | opsconsole_ingest.day | > +------------------------------------------+-----------------------------------+--------------------------------+-----------------------------------+---------------------------------------+---------------------------------------+----------------------------+----------------------------+-----------------------------+------------------------------+--------------------------------+-----------------------------+------------------------------------+-------------------------+--------------------------+------------------------+--+ > | 11.0 | 0.0 > | 3.83399394E8 | EF0D03C409681B98646F316CA1088973 | > 174f53fb-ca9b-d3f9-64e1-7631bf906817 | 00000000-0000-0000-0000-000000000000 > | est | 2016-01-13T06:58:19 | 8 > | 3.0 SP11 (8.110.7601.18923) | MSXML 3.0 Version | > PC Information | ops-20160228_23_35_01.gz | 2016 > | 2 | 29 | > | 11.0 | 0.0 > | 3.83399395E8 | EF0D03C409681B98646F316CA1088973 | > 174f53fb-ca9b-d3f9-64e1-7631bf906817 | 00000000-0000-0000-0000-000000000000 > | est | 2016-01-13T06:58:19 | 2 > | GenuineIntel | CPU Vendor | > PC Information | ops-20160228_23_35_01.gz | 2016 > | 2 | 29 | > | 11.0 | 0.0 > | 3.83399396E8 | EF0D03C409681B98646F316CA1088973 | > 174f53fb-ca9b-d3f9-64e1-7631bf906817 | 00000000-0000-0000-0000-000000000000 > | est | 2016-01-13T06:58:19 | 141 > | 4 | Screens | > PC Information | ops-20160228_23_35_01.gz | 2016 > | 2 | 29 | > +------------------------------------------+-----------------------------------+--------------------------------+-----------------------------------+---------------------------------------+---------------------------------------+----------------------------+----------------------------+-----------------------------+------------------------------+--------------------------------+-----------------------------+------------------------------------+-------------------------+--------------------------+------------------------+--+ > 3 rows selected (1.252 seconds) > {noformat} > Attached shows that no error or warning logs are generated by Spark. > Also the table definition is attached. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org