Hi all,

    I'm now learning how to getting started with carbondata according to the 
tutorial: https://cwiki.apache.org/confluence/display/CARBONDATA/Quick+Start.


    I created a file named sample.csv under the path /home/hadoop/carbondata at 
the master node, and when I run the script:


scala>val dataFilePath = new File("../carbondata/sample.csv").getCanonicalPath
scala>cc.sql(s"load data inpath '$dataFilePath' into table test_table")


it turns out a "InvalidInputException" while the file is acctually exist, here 
is the scripts and logs:


scala> val dataFilePath = new File("../carbondata/sample.csv").getCanonicalPath
dataFilePath: String = /home/hadoop/carbondata/sample.csv


scala> cc.sql(s"load data inpath '$dataFilePath' into table test_table")
INFO  19-12 20:18:22,991 - main Query [LOAD DATA INPATH 
'/HOME/HADOOP/CARBONDATA/SAMPLE.CSV' INTO TABLE TEST_TABLE]
INFO  19-12 20:18:23,271 - Successfully able to get the table metadata file lock
INFO  19-12 20:18:23,276 - main Initiating Direct Load for the Table : 
(default.test_table)
INFO  19-12 20:18:23,279 - main Generate global dictionary from source data 
files!
INFO  19-12 20:18:23,296 - main [Block Distribution]
INFO  19-12 20:18:23,297 - main totalInputSpaceConsumed: 74 , 
defaultParallelism: 28
INFO  19-12 20:18:23,297 - main mapreduce.input.fileinputformat.split.maxsize: 
16777216
INFO  19-12 20:18:23,380 - Block broadcast_0 stored as values in memory 
(estimated size 137.1 KB, free 137.1 KB)
INFO  19-12 20:18:23,397 - Block broadcast_0_piece0 stored as bytes in memory 
(estimated size 15.0 KB, free 152.1 KB)
INFO  19-12 20:18:23,398 - Added broadcast_0_piece0 in memory on 
172.17.195.12:46335 (size: 15.0 KB, free: 511.1 MB)
INFO  19-12 20:18:23,399 - Created broadcast 0 from NewHadoopRDD at 
CarbonTextFile.scala:73
ERROR 19-12 20:18:23,431 - main generate global dictionary failed
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does 
not exist: /home/hadoop/carbondata/sample.csv
        at 
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:285)
        at 
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:340)
        at 
org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:113)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
        at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
        at scala.Option.getOrElse(Option.scala:120)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
        ...


If any of you have met the same problem, would you tell me why this happen, 
looking forward to your replay, thx~

Reply via email to