I agree with you. Maybe some change on data type in Spark that Hive still not support or not competitive so that why It shows NULL.
> On Jul 30, 2016, at 5:47 PM, Mich Talebzadeh <mich.talebza...@gmail.com> > wrote: > > I think it is still a Hive problem because Spark thrift server is basically a > Hive thrift server. > > An ACID test would be to log in to Hive CLI or Hive thrift server (you are > actually using Hive thrift server on port 10000 when using Spark thrift > server) and see whether you see data > > When you use Spark it should work. > > I still believe it is a bug in Hive > > HTH > > Dr Mich Talebzadeh > > LinkedIn > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw> > > http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> > > Disclaimer: Use it at your own risk. Any and all responsibility for any loss, > damage or destruction of data or any other property which may arise from > relying on this email's technical content is explicitly disclaimed. The > author will in no case be liable for any monetary damages arising from such > loss, damage or destruction. > > > On 30 July 2016 at 11:43, Chanh Le <giaosu...@gmail.com > <mailto:giaosu...@gmail.com>> wrote: > Hi Mich, > Thanks for supporting. Here some of my thoughts. > >> BTW can you log in to thrift server and do select * from <TABLE> limit 10 >> >> Do you see the rows? > > Yes I can see the row but all the fields value NULL. > >> Works OK for me > > You just test the number of row. In my case I check and it shows 117 rows but > the problem is about the data is NULL in all fields. > > >> AS I see it the issue is that Hive table created as external on Parquet >> table somehow does not see data. Rows are all nulls. >> >> I don't think this is specific to thrift server. Just log in to Hive and see >> you can read the data from your table topic created as external. >> >> I noticed the same issue > > I don’t think it’s a Hive issue. Right now I am using Spark and Zeppelin. > > > And the point is why with the same parquet file ( I convert from CSV to > parquet) it can be read in Spark but not in STS. > > One more thing is with the same file and method to create table in STS in > Spark 1.6.1 it works fine. > > > Regards, > Chanh > > > >> On Jul 30, 2016, at 2:10 PM, Mich Talebzadeh <mich.talebza...@gmail.com >> <mailto:mich.talebza...@gmail.com>> wrote: >> >> BTW can you log in to thrift server and do select * from <TABLE> limit 10 >> >> Do you see the rows? >> >> Dr Mich Talebzadeh >> >> LinkedIn >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> >> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw> >> >> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> >> >> Disclaimer: Use it at your own risk. Any and all responsibility for any >> loss, damage or destruction of data or any other property which may arise >> from relying on this email's technical content is explicitly disclaimed. The >> author will in no case be liable for any monetary damages arising from such >> loss, damage or destruction. >> >> >> On 30 July 2016 at 07:20, Mich Talebzadeh <mich.talebza...@gmail.com >> <mailto:mich.talebza...@gmail.com>> wrote: >> Works OK for me >> >> scala> val df = >> sqlContext.read.format("com.databricks.spark.csv").option("inferSchema", >> "true").option("header", >> "false").load("hdfs://rhes564:9000/data/stg/accounts/ll/18740868") >> df: org.apache.spark.sql.DataFrame = [C0: string, C1: string, C2: string, >> C3: string, C4: string, C5: string, C6: string, C7: string, C8: string] >> scala> df.write.mode("overwrite").parquet("/user/hduser/ll_18740868.parquet") >> scala> sqlContext.read.parquet("/user/hduser/ll_18740868.parquet")count >> res2: Long = 3651 >> scala> val ff = sqlContext.read.parquet("/user/hduser/ll_18740868.parquet") >> ff: org.apache.spark.sql.DataFrame = [C0: string, C1: string, C2: string, >> C3: string, C4: string, C5: string, C6: string, C7: string, C8: string] >> scala> ff.take(5) >> res3: Array[org.apache.spark.sql.Row] = Array([Transaction Date,Transaction >> Type,Sort Code,Account Number,Transaction Description,Debit Amount,Credit >> Amount,Balance,], [31/12/2009,CPT,'30-64-72,18740868,LTSB STH KENSINGTO CD >> 5710 31DEC09 ,90.00,,400.00,null], [31/12/2009,CPT,'30-64-72,18740868,LTSB >> CHELSEA (3091 CD 5710 31DEC09 ,10.00,,490.00,null], >> [31/12/2009,DEP,'30-64-72,18740868,CHELSEA ,,500.00,500.00,null], >> [Transaction Date,Transaction Type,Sort Code,Account Number,Transaction >> Description,Debit Amount,Credit Amount,Balance,]) >> >> Now in Zeppelin create an external table and read it >> >> <image.png> >> >> >> >> HTH >> >> >> >> Dr Mich Talebzadeh >> >> LinkedIn >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> >> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw> >> >> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> >> >> Disclaimer: Use it at your own risk. Any and all responsibility for any >> loss, damage or destruction of data or any other property which may arise >> from relying on this email's technical content is explicitly disclaimed. The >> author will in no case be liable for any monetary damages arising from such >> loss, damage or destruction. >> >> >> On 29 July 2016 at 09:04, Chanh Le <giaosu...@gmail.com >> <mailto:giaosu...@gmail.com>> wrote: >> I continue to debug >> >> 16/07/29 13:57:35 INFO FileScanRDD: Reading File path: >> file:///Users/giaosudau/Documents/Topics.parquet/part-r-00000-8997050f-e063-427e-b53c-f0a61739706f.gz.parquet, >> range: 0-3118, partition values: [empty row] >> vs OK one >> 16/07/29 15:02:47 INFO FileScanRDD: Reading File path: >> file:///Users/giaosudau/data_example/FACT_ADMIN_HOURLY/time=2016-07-24-18/network_id=30206/part-r-00000-c5f5e18d-c8a1-4831-8903-3c60b02bdfe8.snappy.parquet, >> range: 0-6050, partition values: [2016-07-24-18,30206] >> >> I attached 2 files. >> >> >> >> >> >> >>> On Jul 29, 2016, at 9:44 AM, Chanh Le <giaosu...@gmail.com >>> <mailto:giaosu...@gmail.com>> wrote: >>> >>> Hi everyone, >>> >>> For more investigation I attached the file that I convert CSV to parquet. >>> >>> Spark Code >>> >>> I loaded from CSV file >>> val df = spark.sqlContext.read >>> .format("com.databricks.spark.csv").option("delimiter", >>> ",").option("header", "true").option("inferSchema", >>> "true").load("/Users/giaosudau/Downloads/Topics.xls - Sheet 1.csv") >>> I create a Parquet >>> df.write.mode("overwrite").parquet("/Users/giaosudau/Documents/Topics.parquet”) >>> >>> It’s OK in Spark-Shell >>> >>> scala> df.take(5) >>> res22: Array[org.apache.spark.sql.Row] = Array([124,Nghệ thuật & Giải >>> trí,Arts & Entertainment,0,124,1], [53,Scandal,Scandal,124,124,53,2], >>> [54,Showbiz - World,Showbiz-World,124,124,54,2], [52,Âm >>> nhạc,Entertainment-Music,124,124,52,2], [47,Bar - Karaoke - >>> Massage,Bar-Karaoke-Massage-Prostitution,124,124,47,2]) >>> >>> When Create a table in STS >>> >>> 0: jdbc:hive2://localhost:10000> CREATE EXTERNAL TABLE topic (TOPIC_ID int, >>> TOPIC_NAME_VN String, TOPIC_NAME_EN String, PARENT_ID int, FULL_PARENT >>> String, LEVEL_ID int) STORED AS PARQUET LOCATION >>> '/Users/giaosudau/Documents/Topics.parquet’; >>> >>> But I get all result NULL >>> >>> <Screen Shot 2016-07-29 at 9.42.26 AM.png> >>> >>> >>> >>> I think it’s really a BUG right? >>> >>> Regards, >>> Chanh >>> >>> >>> <Topics.parquet> >>> >>> >>> <Topics.xls - Sheet 1.csv> >>> >>> >>> >>> >>> >>>> On Jul 28, 2016, at 4:25 PM, Chanh Le <giaosu...@gmail.com >>>> <mailto:giaosu...@gmail.com>> wrote: >>>> >>>> Hi everyone, >>>> >>>> I have problem when I create a external table in Spark Thrift Server (STS) >>>> and query the data. >>>> >>>> Scenario: >>>> Spark 2.0 >>>> Alluxio 1.2.0 >>>> Zeppelin 0.7.0 >>>> STS start script >>>> /home/spark/spark-2.0.0-bin-hadoop2.6/sbin/start-thriftserver.sh --master >>>> mesos://zk://master1:2181,master2:2181,master3:2181/mesos --conf >>>> spark.driver.memory=5G --conf spark.scheduler.mode=FAIR --class >>>> org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 --jars >>>> /home/spark/spark-2.0.0-bin-hadoop2.6/jars/alluxio-core-client-spark-1.2.0-jar-with-dependencies.jar >>>> --total-executor-cores 35 spark-internal --hiveconf >>>> hive.server2.thrift.port=10000 --hiveconf >>>> hive.metastore.warehouse.dir=/user/hive/warehouse --hiveconf >>>> hive.metastore.metadb.dir=/user/hive/metadb --conf >>>> spark.sql.shuffle.partitions=20 >>>> >>>> I have a file store in Alluxio alluxio://master2:19998/etl_info/TOPIC >>>> >>>> then I create a table in STS by >>>> CREATE EXTERNAL TABLE topic (topic_id int, topic_name_vn String, >>>> topic_name_en String, parent_id int, full_parent String, level_id int) >>>> STORED AS PARQUET LOCATION 'alluxio://master2:19998/etl_info/TOPIC'; >>>> >>>> to compare STS with Spark I create a temp table with name topics >>>> spark.sqlContext.read.parquet("alluxio://master2:19998/etl_info/TOPIC").registerTempTable("topics") >>>> >>>> Then I do query and compare. >>>> <Screen Shot 2016-07-28 at 4.18.59 PM.png> >>>> >>>> >>>> As you can see the result is different. >>>> Is that a bug? Or I did something wrong >>>> >>>> Regards, >>>> Chanh >>> >> >> >> >> > >