Access row column by field name

2015-12-16 Thread Daniel Valdivia
Hi,

I'm processing the json I have in a text file using DataFrames, however right 
now I'm trying to figure out a way to access a certain value within the rows of 
my data frame if I only know the field name and not the respective field 
position in the schema.

I noticed that row.schema and row.dtypes give me information about the 
auto-generate schema, but I cannot see a straigh forward patch for this, I'm 
trying to create a PairRdd out of this 

Is there any easy way to figure out the field position by it's field name (the 
key it had in the json)?

so this

val sqlContext = new SQLContext(sc)
val rawIncRdd = 
sc.textFile("hdfs://1.2.3.4:8020/user/hadoop/incidents/unstructured/inc-0-500.txt")
 val df = sqlContext.jsonRDD(rawIncRdd)
df.foreach(line => println(line.getString(0)))


would turn into something like this

val sqlContext = new SQLContext(sc)
val rawIncRdd = 
sc.textFile("hdfs://1.2.3.4:8020/user/hadoop/incidents/unstructured/inc-0-500.txt")
 val df = sqlContext.jsonRDD(rawIncRdd)
df.foreach(line => println(line.getString("field_name")))

thanks for the advice

Re: Access row column by field name

2015-12-16 Thread Jeff Zhang
use Row.getAs[String](fieldname)

On Thu, Dec 17, 2015 at 10:58 AM, Daniel Valdivia 
wrote:

> Hi,
>
> I'm processing the json I have in a text file using DataFrames, however
> right now I'm trying to figure out a way to access a certain value within
> the rows of my data frame if I only know the field name and not the
> respective field position in the schema.
>
> I noticed that row.schema and row.dtypes give me information about the
> auto-generate schema, but I cannot see a straigh forward patch for this,
> I'm trying to create a PairRdd out of this
>
> Is there any easy way to figure out the field position by it's field name
> (the key it had in the json)?
>
> so this
>
> val sqlContext = new SQLContext(sc)
> val rawIncRdd = sc.textFile("
> hdfs://1.2.3.4:8020/user/hadoop/incidents/unstructured/inc-0-500.txt")
>  val df = sqlContext.jsonRDD(rawIncRdd)
> df.foreach(line => println(line.getString(0)))
>
>
> would turn into something like this
>
> val sqlContext = new SQLContext(sc)
> val rawIncRdd = sc.textFile("
> hdfs://1.2.3.4:8020/user/hadoop/incidents/unstructured/inc-0-500.txt")
>  val df = sqlContext.jsonRDD(rawIncRdd)
> df.foreach(line => println(line.getString(*"field_name"*)))
>
> thanks for the advice
>



-- 
Best Regards

Jeff Zhang