Hi, did you try using single quote instead of double around column name? I
faced similar situation with apache phoenix.
On Saturday, November 15, 2014, Daniel, Ronald (ELS-SDG)
r.dan...@elsevier.com wrote:
Hi all,
I have a SchemaRDD that Is loaded from a file. Each Row contains 7 fields,
one of which holds the text for a sentence from a document.
# Load sentence data table
sentenceRDD = sqlContext.parquetFile('s3n://some/path/thing')
sentenceRDD.take(3)
Out[20]: [Row(annotID=118, annotSet=u'ge', annotType=u'sentence',
endOffset=20194, pii=u'0094576587900440', startOffset=20062, text=u'Paper
IAF-86-85 presented at the 37th Congress of the International Astronautical
Federation, Innsbruck, Austria, 4-11 October 1986.'), Row(annotID=163,
annotSet=u'ge', annotType=u'sentence', endOffset=20249,
pii=u'0094576587900440', startOffset=20194, text=uThe landsat sensors:
Eosat's plans for landsats 6 and 7), Row(annotID=190, annotSet=u'ge',
annotType=u'sentence', endOffset=20342, pii=u'0094576587900440',
startOffset=20334, text=u'Abstract')]
I have this registered as a table and can query it with SQL select
statments. I would also like to filter the RDD using text operations like
regexps that have greated capabilities than SQL's LIKE operator. However,
the code below does not work. Instead I get a runtime error.
openProbsRDD = sentenceRDD.filter(lambda row: remains unknown in
row[text] )
openProbsRDD.take(5)
…
TypeError: tuple indices must be integers, not str
…
If I use row[6] instead of row[text] I get what I am looking for.
However, finding the right numeric index could be a pain.
Can I access the fields in a Row of a SchemaRDD by name, so that I can
map, filter, etc. without a trial and error process of finding the right
int for the fieldname?
Thanks,
Ron Daniel
--
Regards,
Vikas Agarwal
91 – 9928301411
InfoObjects, Inc.
Execution Matters
http://www.infoobjects.com
2041 Mission College Boulevard, #280
Santa Clara, CA 95054
+1 (408) 988-2000 Work
+1 (408) 716-2726 Fax