Re: How to apply schema to queried data from Hive before saving it as parquet file?

2014-11-19 Thread akshayhazari
Thanks for replying .I was unable to figure out how after I use jsonFile/jsonRDD be able to load data into a hive table. Also I was able to save the SchemaRDD I got via hiveContext.sql(...).saveAsParquetFile(Path) ie. save schemardd as parquetfile but when I tried to fetch data from parquet file

Re: How to apply schema to queried data from Hive before saving it as parquet file?

2014-11-19 Thread akshayhazari
Sorry about the confusion I created . I just have started learning this week. Silly me, I was actually writing the schema to a txt file and expecting records. This is what I was supposed to do. Also if you could let me know about adding the data from jsonFile/jsonRDD methods of hiveContext to hive

Building Spark for Hive The requested profile hadoop-1.2 could not be activated because it does not exist.

2014-11-17 Thread akshayhazari
I am using Apache Hadoop 1.2.1 . I wanted to use Spark Sql with Hive. So I tried to build Spark like so . mvn -Phive,hadoop-1.2 -Dhadoop.version=1.2.1 clean -DskipTests package But I get the following error. The requested profile hadoop-1.2 could not be activated because it does not

Re: Building Spark for Hive The requested profile hadoop-1.2 could not be activated because it does not exist.

2014-11-17 Thread akshayhazari
Oops , I guess , this is the right way to do it mvn -Phive -Dhadoop.version=1.2.1 clean -DskipTests package -- View this message in context:

Query from two or more tables Spark Sql .I have done this . Is there any simpler solution.

2014-11-12 Thread akshayhazari
As of now my approach is to fetch all data from tables located in different databases in separate RDD's and then make a union of them and then query on them together. I want to know whether I can perform a query on it directly along with creating an RDD. i.e. Instead of creating two RDDs , firing

Combining data from two tables in two databases postgresql, JdbcRDD.

2014-11-11 Thread akshayhazari
I want to be able to perform a query on two tables in different databases. I want to know whether it can be done. I've heard about union of two RDD's but here I want to connect to something like different partitions of a table. Any help is appreciated import java.io.Serializable; //import

Mysql retrieval and storage using JdbcRDD

2014-11-10 Thread akshayhazari
So far I have tried this and I am able to compile it successfully . There isn't enough documentation on spark for its usage with databases. I am using AbstractFunction0 and AbsctractFunction1 here. I am unable to access the database. The jar just runs without doing anything when submitted. I want