Re: BUG: 1.3.0 org.apache.spark.sql.Row Does not exist in Java API

2015-04-18 Thread Olivier Girardot
Hi Nipun,
you're right, I created the pull request fixing the documentation:
https://github.com/apache/spark/pull/5569
and the corresponding issue:
https://issues.apache.org/jira/browse/SPARK-6992
Thank you for your time,

Olivier.

Le sam. 18 avr. 2015 à 01:11, Nipun Batra batrani...@gmail.com a écrit :

 Hi Oliver

 Thank you for responding.

 I am able to find org.apache.spark.sql.Row in spark-catalyst_2.10-1.3.0,
 BUT it was not visible in API document yesterday (
 https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/package-frame.html).
 I am pretty sure.

 Also I think this document needs to be changed '
 https://spark.apache.org/docs/latest/sql-programming-guide.html'

 return Row.create(fields[0], fields[1].trim());


 needs to be replaced with RowFactory.create.

 Thanks again for your reponse.

 Thanks
 Nipun Batra



 On Fri, Apr 17, 2015 at 2:50 PM, Olivier Girardot ssab...@gmail.com
 wrote:

 Hi Nipun,
 I'm sorry but I don't understand exactly what your problem is ?
 Regarding the org.apache.spark.sql.Row, it does exists in the Spark SQL
 dependency.
 Is it a compilation problem ?
 Are you trying to run a main method using the pom you've just described ?
 or are you trying to spark-submit the jar ?
 If you're trying to run a main method, the scope provided is not designed
 for that and will make your program fail.

 Regards,

 Olivier.

 Le ven. 17 avr. 2015 à 21:52, Nipun Batra bni...@gmail.com a écrit :

 Hi

 The example given in SQL document
 https://spark.apache.org/docs/latest/sql-programming-guide.html

 org.apache.spark.sql.Row Does not exist in Java API or atleast I was not
 able to find it.

 Build Info - Downloaded from spark website

 Dependency
 dependency
 groupIdorg.apache.spark/groupId
 artifactIdspark-sql_2.10/artifactId
 version1.3.0/version
 scopeprovided/scope
 /dependency

 Code in documentation

 // Import factory methods provided by DataType.import
 org.apache.spark.sql.types.DataType;// Import StructType and
 StructFieldimport org.apache.spark.sql.types.StructType;import
 org.apache.spark.sql.types.StructField;// Import Row.import
 org.apache.spark.sql.Row;
 // sc is an existing JavaSparkContext.SQLContext sqlContext = new
 org.apache.spark.sql.SQLContext(sc);
 // Load a text file and convert each line to a
 JavaBean.JavaRDDString people =
 sc.textFile(examples/src/main/resources/people.txt);
 // The schema is encoded in a stringString schemaString = name age;
 // Generate the schema based on the string of schemaListStructField
 fields = new ArrayListStructField();for (String fieldName:
 schemaString.split( )) {
   fields.add(DataType.createStructField(fieldName,
 DataType.StringType, true));}StructType schema =
 DataType.createStructType(fields);
 // Convert records of the RDD (people) to Rows.JavaRDDRow rowRDD =
 people.map(
   new FunctionString, Row() {
 public Row call(String record) throws Exception {
   String[] fields = record.split(,);
   return Row.create(fields[0], fields[1].trim());
 }
   });
 // Apply the schema to the RDD.DataFrame peopleDataFrame =
 sqlContext.createDataFrame(rowRDD, schema);
 // Register the DataFrame as a
 table.peopleDataFrame.registerTempTable(people);
 // SQL can be run over RDDs that have been registered as
 tables.DataFrame results = sqlContext.sql(SELECT name FROM people);
 // The results of SQL queries are DataFrames and support all the
 normal RDD operations.// The columns of a row in the result can be
 accessed by ordinal.ListString names = results.map(new FunctionRow,
 String() {
   public String call(Row row) {
 return Name:  + row.getString(0);
   }

 }).collect();


 Thanks
 Nipun





Re: BUG: 1.3.0 org.apache.spark.sql.Row Does not exist in Java API

2015-04-17 Thread Olivier Girardot
Hi Nipun,
I'm sorry but I don't understand exactly what your problem is ?
Regarding the org.apache.spark.sql.Row, it does exists in the Spark SQL
dependency.
Is it a compilation problem ?
Are you trying to run a main method using the pom you've just described ?
or are you trying to spark-submit the jar ?
If you're trying to run a main method, the scope provided is not designed
for that and will make your program fail.

Regards,

Olivier.

Le ven. 17 avr. 2015 à 21:52, Nipun Batra bni...@gmail.com a écrit :

 Hi

 The example given in SQL document
 https://spark.apache.org/docs/latest/sql-programming-guide.html

 org.apache.spark.sql.Row Does not exist in Java API or atleast I was not
 able to find it.

 Build Info - Downloaded from spark website

 Dependency
 dependency
 groupIdorg.apache.spark/groupId
 artifactIdspark-sql_2.10/artifactId
 version1.3.0/version
 scopeprovided/scope
 /dependency

 Code in documentation

 // Import factory methods provided by DataType.import
 org.apache.spark.sql.types.DataType;// Import StructType and
 StructFieldimport org.apache.spark.sql.types.StructType;import
 org.apache.spark.sql.types.StructField;// Import Row.import
 org.apache.spark.sql.Row;
 // sc is an existing JavaSparkContext.SQLContext sqlContext = new
 org.apache.spark.sql.SQLContext(sc);
 // Load a text file and convert each line to a
 JavaBean.JavaRDDString people =
 sc.textFile(examples/src/main/resources/people.txt);
 // The schema is encoded in a stringString schemaString = name age;
 // Generate the schema based on the string of schemaListStructField
 fields = new ArrayListStructField();for (String fieldName:
 schemaString.split( )) {
   fields.add(DataType.createStructField(fieldName,
 DataType.StringType, true));}StructType schema =
 DataType.createStructType(fields);
 // Convert records of the RDD (people) to Rows.JavaRDDRow rowRDD =
 people.map(
   new FunctionString, Row() {
 public Row call(String record) throws Exception {
   String[] fields = record.split(,);
   return Row.create(fields[0], fields[1].trim());
 }
   });
 // Apply the schema to the RDD.DataFrame peopleDataFrame =
 sqlContext.createDataFrame(rowRDD, schema);
 // Register the DataFrame as a
 table.peopleDataFrame.registerTempTable(people);
 // SQL can be run over RDDs that have been registered as
 tables.DataFrame results = sqlContext.sql(SELECT name FROM people);
 // The results of SQL queries are DataFrames and support all the
 normal RDD operations.// The columns of a row in the result can be
 accessed by ordinal.ListString names = results.map(new FunctionRow,
 String() {
   public String call(Row row) {
 return Name:  + row.getString(0);
   }

 }).collect();


 Thanks
 Nipun