Hi Mich, i find my problem cause now, i missed setting delimiter which is tab,
but it got error, and i notice that only libre office and open and read well, even if Excel in window, it still can not separate in well format scala> val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").option("delimiter", "").load("/home/martin/result002.csv") java.lang.StringIndexOutOfBoundsException: String index out of range: 0 On Wed, Jun 15, 2016 at 12:14 PM, Mich Talebzadeh <mich.talebza...@gmail.com > wrote: > there may be an issue with data in your csv file. like blank header line > etc. > > sounds like you have an issue there. I normally get rid of blank lines > before putting csv file in hdfs. > > can you actually select from that temp table. like > > sql("select TransactionDate, TransactionType, Description, Value, Balance, > AccountName, AccountNumber from tmp").take(2) > > replace those with your column names. they are mapped using case class > > > HTH > > > > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > > On 15 June 2016 at 03:02, Lee Ho Yeung <jobmatt...@gmail.com> wrote: > >> filter also has error >> >> 16/06/14 19:00:27 WARN Utils: Service 'SparkUI' could not bind on port >> 4040. Attempting port 4041. >> Spark context available as sc. >> SQL context available as sqlContext. >> >> scala> import org.apache.spark.sql.SQLContext >> import org.apache.spark.sql.SQLContext >> >> scala> val sqlContext = new SQLContext(sc) >> sqlContext: org.apache.spark.sql.SQLContext = >> org.apache.spark.sql.SQLContext@3114ea >> >> scala> val df = >> sqlContext.read.format("com.databricks.spark.csv").option("header", >> "true").option("inferSchema", "true").load("/home/martin/result002.csv") >> 16/06/14 19:00:32 WARN SizeEstimator: Failed to check whether >> UseCompressedOops is set; assuming yes >> Java HotSpot(TM) Client VM warning: You have loaded library >> /tmp/libnetty-transport-native-epoll7823347435914767500.so which might have >> disabled stack guard. The VM will try to fix the stack guard now. >> It's highly recommended that you fix the library with 'execstack -c >> <libfile>', or link it with '-z noexecstack'. >> df: org.apache.spark.sql.DataFrame = [a0 a1 a2 a3 a4 a5 >> a6 a7 a8 a9 : string] >> >> scala> df.printSchema() >> root >> |-- a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 : string >> (nullable = true) >> >> >> scala> df.registerTempTable("sales") >> >> scala> df.filter($"a0".contains("found >> deep=1")).filter($"a1".contains("found >> deep=1")).filter($"a2".contains("found deep=1")) >> org.apache.spark.sql.AnalysisException: cannot resolve 'a0' given input >> columns: [a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 ]; >> at >> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) >> >> >> >> >> >> On Tue, Jun 14, 2016 at 6:19 PM, Lee Ho Yeung <jobmatt...@gmail.com> >> wrote: >> >>> after tried following commands, can not show data >>> >>> >>> https://drive.google.com/file/d/0Bxs_ao6uuBDUVkJYVmNaUGx2ZUE/view?usp=sharing >>> >>> https://drive.google.com/file/d/0Bxs_ao6uuBDUc3ltMVZqNlBUYVk/view?usp=sharing >>> >>> /home/martin/Downloads/spark-1.6.1/bin/spark-shell --packages >>> com.databricks:spark-csv_2.11:1.4.0 >>> >>> import org.apache.spark.sql.SQLContext >>> >>> val sqlContext = new SQLContext(sc) >>> val df = >>> sqlContext.read.format("com.databricks.spark.csv").option("header", >>> "true").option("inferSchema", "true").load("/home/martin/result002.csv") >>> df.printSchema() >>> df.registerTempTable("sales") >>> val aggDF = sqlContext.sql("select * from sales where a0 like >>> \"%deep=3%\"") >>> df.collect.foreach(println) >>> aggDF.collect.foreach(println) >>> >>> >>> >>> val df = >>> sqlContext.read.format("com.databricks.spark.csv").option("header", >>> "true").load("/home/martin/result002.csv") >>> df.printSchema() >>> df.registerTempTable("sales") >>> sqlContext.sql("select * from sales").take(30).foreach(println) >>> >> >> >