Stacktrace would be helpful if you can provide that.
On Mon, Oct 19, 2015 at 1:42 PM, fahad shah <sfaha...@gmail.com> wrote: > Hi > > I am trying to do pair rdd's, group by the key assign id based on key. > I am using Pyspark with spark 1.3, for some reason, I am getting this > error that I am unable to figure out - any help much appreciated. > > Things I tried (but to no effect), > > 1. make sure I am not doing any conversions on the strings > 2. make sure that the fields used in the key are all there and not > empty string (or else I toss the row out) > > My code is along following lines (split is using stringio to parse > csv, header removes the header row and parse_train is putting the 54 > fields into named tuple after whitespace/quote removal): > > #Error for string argument is thrown on the BB.take(1) where the > groupbykey is evaluated > > A = sc.textFile("train.csv").filter(lambda x:not > isHeader(x)).map(split).map(parse_train).filter(lambda x: not x is > None) > > A.count() > > B = A.map(lambda k: > > ((k.srch_destination_id,k.srch_length_of_stay,k.srch_booking_window,k.srch_adults_count, > k.srch_children_count,k.srch_room_count), > (k[0:54]))) > BB = B.groupByKey() > BB.take(1) > > > best fahad > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- Best Regards Jeff Zhang