Hi Nipun,
you're right, I created the pull request fixing the documentation:
https://github.com/apache/spark/pull/5569
and the corresponding issue:
https://issues.apache.org/jira/browse/SPARK-6992
Thank you for your time,
Olivier.
Le sam. 18 avr. 2015 à 01:11, Nipun Batra batrani...@gmail.com a écrit :
Hi Oliver
Thank you for responding.
I am able to find org.apache.spark.sql.Row in spark-catalyst_2.10-1.3.0,
BUT it was not visible in API document yesterday (
https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/package-frame.html).
I am pretty sure.
Also I think this document needs to be changed '
https://spark.apache.org/docs/latest/sql-programming-guide.html'
return Row.create(fields[0], fields[1].trim());
needs to be replaced with RowFactory.create.
Thanks again for your reponse.
Thanks
Nipun Batra
On Fri, Apr 17, 2015 at 2:50 PM, Olivier Girardot ssab...@gmail.com
wrote:
Hi Nipun,
I'm sorry but I don't understand exactly what your problem is ?
Regarding the org.apache.spark.sql.Row, it does exists in the Spark SQL
dependency.
Is it a compilation problem ?
Are you trying to run a main method using the pom you've just described ?
or are you trying to spark-submit the jar ?
If you're trying to run a main method, the scope provided is not designed
for that and will make your program fail.
Regards,
Olivier.
Le ven. 17 avr. 2015 à 21:52, Nipun Batra bni...@gmail.com a écrit :
Hi
The example given in SQL document
https://spark.apache.org/docs/latest/sql-programming-guide.html
org.apache.spark.sql.Row Does not exist in Java API or atleast I was not
able to find it.
Build Info - Downloaded from spark website
Dependency
dependency
groupIdorg.apache.spark/groupId
artifactIdspark-sql_2.10/artifactId
version1.3.0/version
scopeprovided/scope
/dependency
Code in documentation
// Import factory methods provided by DataType.import
org.apache.spark.sql.types.DataType;// Import StructType and
StructFieldimport org.apache.spark.sql.types.StructType;import
org.apache.spark.sql.types.StructField;// Import Row.import
org.apache.spark.sql.Row;
// sc is an existing JavaSparkContext.SQLContext sqlContext = new
org.apache.spark.sql.SQLContext(sc);
// Load a text file and convert each line to a
JavaBean.JavaRDDString people =
sc.textFile(examples/src/main/resources/people.txt);
// The schema is encoded in a stringString schemaString = name age;
// Generate the schema based on the string of schemaListStructField
fields = new ArrayListStructField();for (String fieldName:
schemaString.split( )) {
fields.add(DataType.createStructField(fieldName,
DataType.StringType, true));}StructType schema =
DataType.createStructType(fields);
// Convert records of the RDD (people) to Rows.JavaRDDRow rowRDD =
people.map(
new FunctionString, Row() {
public Row call(String record) throws Exception {
String[] fields = record.split(,);
return Row.create(fields[0], fields[1].trim());
}
});
// Apply the schema to the RDD.DataFrame peopleDataFrame =
sqlContext.createDataFrame(rowRDD, schema);
// Register the DataFrame as a
table.peopleDataFrame.registerTempTable(people);
// SQL can be run over RDDs that have been registered as
tables.DataFrame results = sqlContext.sql(SELECT name FROM people);
// The results of SQL queries are DataFrames and support all the
normal RDD operations.// The columns of a row in the result can be
accessed by ordinal.ListString names = results.map(new FunctionRow,
String() {
public String call(Row row) {
return Name: + row.getString(0);
}
}).collect();
Thanks
Nipun