Re: How to apply schema to queried data from Hive before saving it as parquet file?

2014-11-19 Thread akshayhazari
Thanks for replying .I was unable to figure out how after I use
jsonFile/jsonRDD be able to load data into a hive table. Also I was able to
save the SchemaRDD I got via hiveContext.sql(...).saveAsParquetFile(Path)
ie. save schemardd as parquetfile but when I tried to fetch data from
parquet file back like so(below) and save data back to a text file i Got
some weird values like org.apache.spark.sql.api.java.Row@e26c01c7 in the
text files generated as output :--

 JavaSchemaRDD
parquetfilerdd=sqlContext.parquetFile(path/to/parquet/File);
parquetfilerdd.registerTempTable(pq);
JavaSchemaRDD writetxt=sqlCtx.sql(Select * from pq);  
writetxt.saveAsTextFile(Path/To/Text/Files);  // This step created
text files which was filled with values
likeorg.apache.spark.sql.api.java.Row@e26c01c7

 I know there must be something which could do it right, just that I haven't
been able to figure out all the while. Could you please help .
   



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-apply-schema-to-queried-data-from-Hive-before-saving-it-as-parquet-file-tp19259p19338.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: How to apply schema to queried data from Hive before saving it as parquet file?

2014-11-19 Thread akshayhazari
Sorry about the confusion I created . I just have started learning this week.
Silly me, I was actually writing the schema to a txt file and expecting
records. This is what I was supposed to do. Also if you could let me know
about adding the data from jsonFile/jsonRDD methods of hiveContext to hive
tables it will be appreciated. 

JavaRDDString result=writetxt.map(new FunctionRow, String() {

public String call(Row row) {
String temp=;
temp+=(row.getInt(0))+ ;
temp+=row.getString(1)+ ;
temp+=(row.getInt(2));
return temp;
}
});
result.saveAsTextFile(pqtotxt);



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-apply-schema-to-queried-data-from-Hive-before-saving-it-as-parquet-file-tp19259p19343.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Building Spark for Hive The requested profile hadoop-1.2 could not be activated because it does not exist.

2014-11-17 Thread akshayhazari
I am using Apache Hadoop 1.2.1 . I wanted to use Spark Sql with Hive. So I
tried to build Spark like so .

 mvn -Phive,hadoop-1.2 -Dhadoop.version=1.2.1 clean -DskipTests package
  
   But I get the following error.
  The requested profile hadoop-1.2 could not be activated because it does
not exist. 
  Is there some way to handle this or do I have to downgrade Hadoop. Any
help is Appreciated




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Building-Spark-for-Hive-The-requested-profile-hadoop-1-2-could-not-be-activated-because-it-does-not--tp19063.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Building Spark for Hive The requested profile hadoop-1.2 could not be activated because it does not exist.

2014-11-17 Thread akshayhazari
Oops , I guess , this is the right way to do it 
mvn -Phive -Dhadoop.version=1.2.1 clean -DskipTests package



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Building-Spark-for-Hive-The-requested-profile-hadoop-1-2-could-not-be-activated-because-it-does-not--tp19063p19068.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Query from two or more tables Spark Sql .I have done this . Is there any simpler solution.

2014-11-12 Thread akshayhazari
As of now my approach is to fetch all data from tables located in different
databases in separate RDD's and then make a union of them and then query on
them together. I want to know whether I can perform a query on it directly
along with creating an RDD. i.e. Instead of creating two RDDs , firing a
query on both the tables in a single JdbcRDD and creating an RDD of it.
Other than the above if any alternate methods are available they are also
welcome.

The below way of doing it is complex involving fetching all data. I want to
reduce time . Any help is appreciated .




import java.io.Serializable;
import scala.*;
import scala.runtime.AbstractFunction0;
import scala.runtime.AbstractFunction1;
import scala.runtime.*;
import scala.collection.mutable.LinkedHashMap;
import org.apache.spark.api.java.*;
import org.apache.spark.api.java.function.*;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.rdd.*;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.Function;
import org.apache.spark.sql.api.java.JavaSQLContext;
import org.apache.spark.sql.api.java.JavaSchemaRDD;
import org.apache.spark.sql.api.java.Row;
import org.apache.spark.sql.api.java.DataType;
import org.apache.spark.sql.api.java.StructType;
import org.apache.spark.sql.api.java.StructField;
import org.apache.spark.sql.api.java.Row;
import org.apache.spark.SparkConf;
import org.apache.spark.SparkContext;
import java.sql.*;
import java.util.*;
import org.postgresql.Driver;
import java.io.*;
public class Spark_Postgresql {
public static class tableSchema implements Serializable {
private String ID;
private String INTENSITY;
public String getID() {
return ID;
}
public void setINTENSITY(String iNTENSITY) {
INTENSITY = iNTENSITY;
}
public String getINTENSITY() {
return INTENSITY;
}
public void setID(String iD) {
ID = iD;
}
  }
static class Z extends AbstractFunction0java.sql.Connection implements
Serializable
{
java.sql.Connection con;
static String
URL=jdbc:postgresql://localhost:5432/postgres?user=postgrespassword=postgres;
public java.sql.Connection apply()
{

try 
{
con=DriverManager.getConnection(URL);
}
catch(Exception e)
{
e.printStackTrace();
}
return con;
}
// public void change(String DB)
// {
//  URL=jdbc:postgresql://+DB+?user=postgrespassword=postgres;
// }

}
static class Z2 extends AbstractFunction0java.sql.Connection
implements Serializable
{
java.sql.Connection con;
static String
URL=jdbc:postgresql://localhost:5432/postgres1?user=postgrespassword=postgres;
public java.sql.Connection apply()
{

try 
{
con=DriverManager.getConnection(URL);
}
catch(Exception e)
{
e.printStackTrace();
}
return con;
}
}
static public class Z1 extends AbstractFunction1ResultSet,String
implements Serializable
{

public String apply(ResultSet i) {
String ret=;
Integer colcnt=0;
try{
ResultSetMetaData meta = i.getMetaData();
colcnt=meta.getColumnCount();
Integer k=1,type;
while(k=colcnt)
{
type=meta.getColumnType(k);
if(k==1)
{
if((type==Types.VARCHAR)||(type==Types.CHAR))
ret+=i.getString(k);
else if(type==Types.FLOAT)
ret+=String.valueOf(i.getFloat(k));
else if(type==Types.INTEGER)
ret+=String.valueOf(i.getInt(k));
else if(type==Types.DOUBLE)
ret+=String.valueOf(i.getDouble(k));
}
else
{
if((type==Types.VARCHAR)||(type==Types.CHAR))
ret+= +i.getString(k);
else if(type==Types.FLOAT)
ret+= +String.valueOf(i.getFloat(k));
else 
if((type==Types.INTEGER)||(type==Types.BIGINT))
ret+= +String.valueOf(i.getInt(k));
else if(type==Types.DOUBLE)
ret+= +String.valueOf(i.getDouble(k));
   

Combining data from two tables in two databases postgresql, JdbcRDD.

2014-11-11 Thread akshayhazari
I want to be able to perform a query on two tables in different databases. I
want to know whether it can be done. I've heard about union of two RDD's but
here I want to connect to something like different partitions of a table.

Any help is appreciated


import java.io.Serializable;
//import org.junit.*;
//import static org.junit.Assert.*;
import scala.*;
import scala.runtime.AbstractFunction0;
import scala.runtime.AbstractFunction1;
import scala.runtime.*;
import scala.collection.mutable.LinkedHashMap;
//import static scala.collection.Map.Projection;
import org.apache.spark.api.java.*;
import org.apache.spark.api.java.function.*;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.rdd.*;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.Function;
import org.apache.spark.sql.api.java.JavaSQLContext;
import org.apache.spark.sql.api.java.JavaSchemaRDD;
import org.apache.spark.sql.api.java.Row;
import org.apache.spark.sql.api.java.DataType;
import org.apache.spark.sql.api.java.StructType;
import org.apache.spark.sql.api.java.StructField;
import org.apache.spark.sql.api.java.Row;
import org.apache.spark.SparkConf;
import org.apache.spark.SparkContext;
import java.sql.*;
import java.util.*;
import com.mysql.jdbc.Driver;
import com.mysql.jdbc.*;
import java.io.*;
public class Spark_Mysql {
 
static class Z extends AbstractFunction0java.sql.Connection implements
Serializable
{
java.sql.Connection con;
public java.sql.Connection apply()
{

try {

con=DriverManager.getConnection(jdbc:mysql://localhost:3306/azkaban?user=azkabanpassword=password);
}
catch(Exception e)
{
e.printStackTrace();
}
return con;
}

}
static public class Z1 extends AbstractFunction1ResultSet,Integer
implements Serializable
{
int ret;
public Integer apply(ResultSet i) {

try{
ret=i.getInt(1);
}
catch(Exception e)
{e.printStackTrace();}
return ret;
}
}
public static void main(String[] args) throws Exception {
String arr[]=new String[1];

arr[0]=/home/hduser/Documents/Credentials/Newest_Credentials_AX/spark-1.1.0-bin-hadoop1/lib/mysql-connector-java-5.1.33-bin.jar;

JavaSparkContext ctx = new JavaSparkContext(new
SparkConf().setAppName(JavaSparkSQL).setJars(arr));
SparkContext sctx = new SparkContext(new
SparkConf().setAppName(JavaSparkSQL).setJars(arr));
JavaSQLContext sqlCtx = new JavaSQLContext(ctx);

  try 
{
Class.forName(com.mysql.jdbc.Driver);
}
catch(Exception ex) 
{
ex.printStackTrace();
System.exit(1);
}
  JdbcRDD rdd=new JdbcRDD(sctx,new Z(),SELECT * FROM spark WHERE ? = 
id
AND id = ?,0L, 1000L, 10,new
Z1(),scala.reflect.ClassTag$.MODULE$.AnyRef());
  
  rdd.saveAsTextFile(hdfs://127.0.0.1:9000/user/hduser/mysqlrdd); 
  rdd.saveAsTextFile(/home/hduser/mysqlrdd); 
  
}
}



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Combining-data-from-two-tables-in-two-databases-postgresql-JdbcRDD-tp18597.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Mysql retrieval and storage using JdbcRDD

2014-11-10 Thread akshayhazari
So far I have tried this and I am able to compile it successfully . There
isn't enough documentation on spark for its usage with databases. I am using
AbstractFunction0 and AbsctractFunction1 here. I am unable to access the
database. The jar just runs without doing anything when submitted. I want to
know how is it supposed to be done and what wrongs have I done here. Any
help is appreciated.

import java.io.Serializable;
import scala.*;
import scala.runtime.AbstractFunction0;
import scala.runtime.AbstractFunction1;
import scala.runtime.*;
import scala.collection.mutable.LinkedHashMap;
import org.apache.spark.api.java.*;
import org.apache.spark.api.java.function.*;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.rdd.*;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.Function;
import org.apache.spark.sql.api.java.JavaSQLContext;
import org.apache.spark.sql.api.java.JavaSchemaRDD;
import org.apache.spark.sql.api.java.Row;
import org.apache.spark.sql.api.java.DataType;
import org.apache.spark.sql.api.java.StructType;
import org.apache.spark.sql.api.java.StructField;
import org.apache.spark.sql.api.java.Row;
import org.apache.spark.SparkConf;
import org.apache.spark.SparkContext;
import java.sql.*;
import java.util.*;
import java.io.*;
public class Spark_Mysql {
@SuppressWarnings(serial)
static class Z extends AbstractFunction0Connection 
{
Connection con;
public Connection apply()
{

try {
Class.forName(com.mysql.jdbc.Driver);

con=DriverManager.getConnection(jdbc:mysql://localhost:3306/?user=azkabanpassword=password);
}
catch(Exception e)
{
}
return con;
}

}
static public class Z1 extends AbstractFunction1ResultSet,Integer 
{
int ret;
public Integer apply(ResultSet i) {

try{
ret=i.getInt(1);
}
catch(Exception e)
{}
return ret;
}
}

@SuppressWarnings(serial)
public static void main(String[] args) throws Exception {

String arr[]=new String[1];

arr[0]=/home/hduser/Documents/Credentials/Newest_Credentials_AX/spark-1.1.0-bin-hadoop1/lib/mysql-connector-java-5.1.33-bin.jar;

JavaSparkContext ctx = new JavaSparkContext(new
SparkConf().setAppName(JavaSparkSQL).setJars(arr));
SparkContext sctx = new SparkContext(new
SparkConf().setAppName(JavaSparkSQL).setJars(arr));
JavaSQLContext sqlCtx = new JavaSQLContext(ctx);

  try 
{
Class.forName(com.mysql.jdbc.Driver);
}
catch(Exception ex) 
{
System.exit(1);
}
  Connection
zconn=DriverManager.getConnection(jdbc:mysql://localhost:3306/?user=azkabanpassword=password);

  JdbcRDD rdd=new JdbcRDD(sctx,new Z(),SELECT * FROM spark WHERE ? = 
id
AND id = ?,0L, 1000L, 10,new
Z1(),scala.reflect.ClassTag$.MODULE$.AnyRef());
  
  rdd.saveAsTextFile(hdfs://127.0.0.1:9000/user/hduser/mysqlrdd); 
  rdd.saveAsTextFile(/home/hduser/mysqlrdd); 
}
}



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Mysql-retrieval-and-storage-using-JdbcRDD-tp18479.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org