Thanks for replying .I was unable to figure out how after I use
jsonFile/jsonRDD be able to load data into a hive table. Also I was able to
save the SchemaRDD I got via hiveContext.sql(...).saveAsParquetFile(Path)
ie. save schemardd as parquetfile but when I tried to fetch data from
parquet file
Sorry about the confusion I created . I just have started learning this week.
Silly me, I was actually writing the schema to a txt file and expecting
records. This is what I was supposed to do. Also if you could let me know
about adding the data from jsonFile/jsonRDD methods of hiveContext to hive
I am using Apache Hadoop 1.2.1 . I wanted to use Spark Sql with Hive. So I
tried to build Spark like so .
mvn -Phive,hadoop-1.2 -Dhadoop.version=1.2.1 clean -DskipTests package
But I get the following error.
The requested profile hadoop-1.2 could not be activated because it does
not
Oops , I guess , this is the right way to do it
mvn -Phive -Dhadoop.version=1.2.1 clean -DskipTests package
--
View this message in context:
As of now my approach is to fetch all data from tables located in different
databases in separate RDD's and then make a union of them and then query on
them together. I want to know whether I can perform a query on it directly
along with creating an RDD. i.e. Instead of creating two RDDs , firing
I want to be able to perform a query on two tables in different databases. I
want to know whether it can be done. I've heard about union of two RDD's but
here I want to connect to something like different partitions of a table.
Any help is appreciated
import java.io.Serializable;
//import
So far I have tried this and I am able to compile it successfully . There
isn't enough documentation on spark for its usage with databases. I am using
AbstractFunction0 and AbsctractFunction1 here. I am unable to access the
database. The jar just runs without doing anything when submitted. I want