Hi,

I just migrated from zeppelin 0.7.0 to zeppelin 0.7.1 and I am facing this
error while creating an RDD(in pyspark).

UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 0:
> invalid start byte


I was able to create the RDD without any error after adding
use_unicode=False as follows

> sc.textFile("file.csv",use_unicode=False)


​But it fails when I try to stem the text. I am getting similar error when
trying to apply stemming to the text using python interpreter.

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4:
> ordinal not in range(128)

All these code is working in 0.7.0 version. There is no change in the
dataset and code. ​Is there any change in the encoding type in the new
version of zeppelin?

Regards,
Meethu Mathew

Reply via email to