Re: can not show all data for this table

Mich Talebzadeh Wed, 15 Jun 2016 07:03:54 -0700

at last some progress :)

Dr Mich Talebzadeh




LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 15 June 2016 at 10:52, Lee Ho Yeung <jobmatt...@gmail.com> wrote:

> Hi Mich,
>
> i find my problem cause now, i missed setting delimiter which is tab,
>
> but it got error,
>
> and i notice that only libre office and open and read well, even if Excel
> in window, it still can not separate in well format
>
> scala> val df =
> sqlContext.read.format("com.databricks.spark.csv").option("header",
> "true").option("inferSchema", "true").option("delimiter",
> "").load("/home/martin/result002.csv")
> java.lang.StringIndexOutOfBoundsException: String index out of range: 0
>
>
> On Wed, Jun 15, 2016 at 12:14 PM, Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> there may be an issue with data in your csv file. like blank header line
>> etc.
>>
>> sounds like you have an issue there. I normally get rid of blank lines
>> before putting csv file in hdfs.
>>
>> can you actually select from that temp table. like
>>
>> sql("select TransactionDate, TransactionType, Description, Value,
>> Balance, AccountName, AccountNumber from tmp").take(2)
>>
>> replace those with your column names. they are mapped using case class
>>
>>
>> HTH
>>
>>
>>
>>
>> Dr Mich Talebzadeh
>>
>>
>>
>> LinkedIn * 
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>>
>>
>>
>> http://talebzadehmich.wordpress.com
>>
>>
>>
>> On 15 June 2016 at 03:02, Lee Ho Yeung <jobmatt...@gmail.com> wrote:
>>
>>> filter also has error
>>>
>>> 16/06/14 19:00:27 WARN Utils: Service 'SparkUI' could not bind on port
>>> 4040. Attempting port 4041.
>>> Spark context available as sc.
>>> SQL context available as sqlContext.
>>>
>>> scala> import org.apache.spark.sql.SQLContext
>>> import org.apache.spark.sql.SQLContext
>>>
>>> scala> val sqlContext = new SQLContext(sc)
>>> sqlContext: org.apache.spark.sql.SQLContext =
>>> org.apache.spark.sql.SQLContext@3114ea
>>>
>>> scala> val df =
>>> sqlContext.read.format("com.databricks.spark.csv").option("header",
>>> "true").option("inferSchema", "true").load("/home/martin/result002.csv")
>>> 16/06/14 19:00:32 WARN SizeEstimator: Failed to check whether
>>> UseCompressedOops is set; assuming yes
>>> Java HotSpot(TM) Client VM warning: You have loaded library
>>> /tmp/libnetty-transport-native-epoll7823347435914767500.so which might have
>>> disabled stack guard. The VM will try to fix the stack guard now.
>>> It's highly recommended that you fix the library with 'execstack -c
>>> <libfile>', or link it with '-z noexecstack'.
>>> df: org.apache.spark.sql.DataFrame = [a0    a1    a2    a3    a4
>>> a5    a6    a7    a8    a9    : string]
>>>
>>> scala> df.printSchema()
>>> root
>>>  |-- a0    a1    a2    a3    a4    a5    a6    a7    a8    a9    :
>>> string (nullable = true)
>>>
>>>
>>> scala> df.registerTempTable("sales")
>>>
>>> scala> df.filter($"a0".contains("found
>>> deep=1")).filter($"a1".contains("found
>>> deep=1")).filter($"a2".contains("found deep=1"))
>>> org.apache.spark.sql.AnalysisException: cannot resolve 'a0' given input
>>> columns: [a0    a1    a2    a3    a4    a5    a6    a7    a8    a9    ];
>>>     at
>>> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Jun 14, 2016 at 6:19 PM, Lee Ho Yeung <jobmatt...@gmail.com>
>>> wrote:
>>>
>>>> after tried following commands, can not show data
>>>>
>>>>
>>>> https://drive.google.com/file/d/0Bxs_ao6uuBDUVkJYVmNaUGx2ZUE/view?usp=sharing
>>>>
>>>> https://drive.google.com/file/d/0Bxs_ao6uuBDUc3ltMVZqNlBUYVk/view?usp=sharing
>>>>
>>>> /home/martin/Downloads/spark-1.6.1/bin/spark-shell --packages
>>>> com.databricks:spark-csv_2.11:1.4.0
>>>>
>>>> import org.apache.spark.sql.SQLContext
>>>>
>>>> val sqlContext = new SQLContext(sc)
>>>> val df =
>>>> sqlContext.read.format("com.databricks.spark.csv").option("header",
>>>> "true").option("inferSchema", "true").load("/home/martin/result002.csv")
>>>> df.printSchema()
>>>> df.registerTempTable("sales")
>>>> val aggDF = sqlContext.sql("select * from sales where a0 like
>>>> \"%deep=3%\"")
>>>> df.collect.foreach(println)
>>>> aggDF.collect.foreach(println)
>>>>
>>>>
>>>>
>>>> val df =
>>>> sqlContext.read.format("com.databricks.spark.csv").option("header",
>>>> "true").load("/home/martin/result002.csv")
>>>> df.printSchema()
>>>> df.registerTempTable("sales")
>>>> sqlContext.sql("select * from sales").take(30).foreach(println)
>>>>
>>>
>>>
>>
>

Re: can not show all data for this table

Reply via email to