Re: can not show all data for this table

Lee Ho Yeung Wed, 15 Jun 2016 02:52:40 -0700

Hi Mich,

i find my problem cause now, i missed setting delimiter which is tab,


but it got error,

and i notice that only libre office and open and read well, even if Excel
in window, it still can not separate in well format

scala> val df =
sqlContext.read.format("com.databricks.spark.csv").option("header",
"true").option("inferSchema", "true").option("delimiter",
"").load("/home/martin/result002.csv")
java.lang.StringIndexOutOfBoundsException: String index out of range: 0


On Wed, Jun 15, 2016 at 12:14 PM, Mich Talebzadeh <mich.talebza...@gmail.com
> wrote:

> there may be an issue with data in your csv file. like blank header line
> etc.
>
> sounds like you have an issue there. I normally get rid of blank lines
> before putting csv file in hdfs.
>
> can you actually select from that temp table. like
>
> sql("select TransactionDate, TransactionType, Description, Value, Balance,
> AccountName, AccountNumber from tmp").take(2)
>
> replace those with your column names. they are mapped using case class
>
>
> HTH
>
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> On 15 June 2016 at 03:02, Lee Ho Yeung <jobmatt...@gmail.com> wrote:
>
>> filter also has error
>>
>> 16/06/14 19:00:27 WARN Utils: Service 'SparkUI' could not bind on port
>> 4040. Attempting port 4041.
>> Spark context available as sc.
>> SQL context available as sqlContext.
>>
>> scala> import org.apache.spark.sql.SQLContext
>> import org.apache.spark.sql.SQLContext
>>
>> scala> val sqlContext = new SQLContext(sc)
>> sqlContext: org.apache.spark.sql.SQLContext =
>> org.apache.spark.sql.SQLContext@3114ea
>>
>> scala> val df =
>> sqlContext.read.format("com.databricks.spark.csv").option("header",
>> "true").option("inferSchema", "true").load("/home/martin/result002.csv")
>> 16/06/14 19:00:32 WARN SizeEstimator: Failed to check whether
>> UseCompressedOops is set; assuming yes
>> Java HotSpot(TM) Client VM warning: You have loaded library
>> /tmp/libnetty-transport-native-epoll7823347435914767500.so which might have
>> disabled stack guard. The VM will try to fix the stack guard now.
>> It's highly recommended that you fix the library with 'execstack -c
>> <libfile>', or link it with '-z noexecstack'.
>> df: org.apache.spark.sql.DataFrame = [a0    a1    a2    a3    a4    a5
>> a6    a7    a8    a9    : string]
>>
>> scala> df.printSchema()
>> root
>>  |-- a0    a1    a2    a3    a4    a5    a6    a7    a8    a9    : string
>> (nullable = true)
>>
>>
>> scala> df.registerTempTable("sales")
>>
>> scala> df.filter($"a0".contains("found
>> deep=1")).filter($"a1".contains("found
>> deep=1")).filter($"a2".contains("found deep=1"))
>> org.apache.spark.sql.AnalysisException: cannot resolve 'a0' given input
>> columns: [a0    a1    a2    a3    a4    a5    a6    a7    a8    a9    ];
>>     at
>> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
>>
>>
>>
>>
>>
>> On Tue, Jun 14, 2016 at 6:19 PM, Lee Ho Yeung <jobmatt...@gmail.com>
>> wrote:
>>
>>> after tried following commands, can not show data
>>>
>>>
>>> https://drive.google.com/file/d/0Bxs_ao6uuBDUVkJYVmNaUGx2ZUE/view?usp=sharing
>>>
>>> https://drive.google.com/file/d/0Bxs_ao6uuBDUc3ltMVZqNlBUYVk/view?usp=sharing
>>>
>>> /home/martin/Downloads/spark-1.6.1/bin/spark-shell --packages
>>> com.databricks:spark-csv_2.11:1.4.0
>>>
>>> import org.apache.spark.sql.SQLContext
>>>
>>> val sqlContext = new SQLContext(sc)
>>> val df =
>>> sqlContext.read.format("com.databricks.spark.csv").option("header",
>>> "true").option("inferSchema", "true").load("/home/martin/result002.csv")
>>> df.printSchema()
>>> df.registerTempTable("sales")
>>> val aggDF = sqlContext.sql("select * from sales where a0 like
>>> \"%deep=3%\"")
>>> df.collect.foreach(println)
>>> aggDF.collect.foreach(println)
>>>
>>>
>>>
>>> val df =
>>> sqlContext.read.format("com.databricks.spark.csv").option("header",
>>> "true").load("/home/martin/result002.csv")
>>> df.printSchema()
>>> df.registerTempTable("sales")
>>> sqlContext.sql("select * from sales").take(30).foreach(println)
>>>
>>
>>
>

Re: can not show all data for this table

Reply via email to