Hi Carbondata experts,
I'm new to Spark, also to Carbondata.
I'm trying to leverage Carbondata to store some key-value pairs on HDFS. To
start with, I issued a few commands on Spark shell to help me better
understand the behavior.
Here is how I launched spark shell:
=
spark-shell --spark-version 2.3.0 spark.hive.support=true --driver-memory
2G --num-executors 50 --executor-cores 2 --executor-memory 2G --jars
apache-carbondata-1.5.2-bin-spark2.3.2-hadoop2.7.2.jar
Here is how i issued the commands:
===
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.CarbonSession._
val carbon =
SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession("
hdfs://:9000/user/kuyu/carbondata3", "/export/home/kuyu/wenye")
val schema = StructType(Array(
StructField("keyCol", StringType, false),
StructField("deltaCol", LongType, false),
StructField("__opalSegmentId", IntegerType, false),
StructField("__opalSegmentOffset", IntegerType, false)))
val keyStoreDF = carbon.read.format("csv").option("header",
"true").schema(schema).load("hdfs://:9000/user/kuyu/keystore.csv")
val carbonDFWriter = new CarbonDataFrameWriter(carbon.sqlContext,
keyStoreDF)
val options = Map("tableName" -> "wenye_xyz")
carbonDFWriter.saveAsCarbonFile(options)
What I found:
'Fact', 'LockFiles', 'Metadata' are created under
hdfs://:9000/user/kuyu/carbondata3/wenye_xyz. However I couldn't
find /export/home/kuyu/wenye was created anywhere. I saw Carbon used derby
DB by default, which should create the /export/home/kuyu/wenye on local
disk. Is my understanding correct?
Thanks,
KY