apache-spark doesn't work correktly with russian alphabet

2017-01-18 Thread AlexModestov
I want to use Apache Spark for working with text data. There are some Russian symbols but Apache Spark shows me strings which look like as "...\u0413\u041e\u0420\u041e...". What should I do for correcting them. -- View this message in context:

spark use /tmp directory instead of directory from spark.local.dir

2016-12-15 Thread AlexModestov
Hello! I want to use another dir instaed of /tmp directory for all stuff... I set spark.local.dir and -Djava.io.tmpdir=/... but I see that Spark uses /tmp for some data... What does Spark do? And what should I do my Spark uses only my directories? Thank you! -- View this message in context:

work with russian letters

2016-08-24 Thread AlexModestov
Hello everybody, I want to work with DataFrames where some columns have a string type. And there are russian letters. Russian letters are incorrect in the text. Could you help me how I should work with them? Thanks. -- View this message in context:

GC overhead limit exceeded

2016-05-16 Thread AlexModestov
I get the error in the apache spark... "spark.driver.memory 60g spark.python.worker.memory 60g spark.master local[*]" The amount of data is about 5Gb, but spark says that "GC overhead limit exceeded". I guess that my conf-file gives enought resources. "16/05/16 15:13:02 WARN

Re: ML regression - spark context dies without error

2016-05-12 Thread AlexModestov
Hello, I have the same problem... Sometimes I get the error: "Py4JError: Answer from Java side is empty" Sometimes my code works fine but sometimes not... Did you find why it might come? What was the reason? Thanks. -- View this message in context:

Re: Need for advice - performance improvement and out of memory resolution

2016-05-12 Thread AlexModestov
Hello. I'm sorry but did you find the answer? I have the similar error and I can not solve it... No one answered me... Spark driver dies and I get the error "Answer from Java side is empty". I thought that it is so because I made a mistake this conf-file I use Sparkling Water 1.6.3, Spark

Error: "Answer from Java side is empty"

2016-05-11 Thread AlexModestov
I use Sparkling Water 1.6.3, Spark 1.6.I use Java Oracle 8 or OpenJDK-7:(every time I get this error when I transform Spark DataFrame into H2O DataFrame. Spark cluster dies..):ERROR:py4j.java_gateway:Error while sending or receiving.Traceback (most recent call last): File

SQL Driver

2016-04-19 Thread AlexModestov
Hello all, I use a string when I'm launching the Sparkling-Water: "--conf spark.driver.extraClassPath='/SQLDrivers/sqljdbc_4.2/enu/sqljdbc41.jar" and I get the error: " --- TypeError Traceback

error "Py4JJavaError: An error occurred while calling z:org.apache.spark.sql.execution.EvaluatePython.takeAndServe."

2016-04-13 Thread AlexModestov
I get this error. Who knows what does it mean? Py4JJavaError: An error occurred while calling z:org.apache.spark.sql.execution.EvaluatePython.takeAndServe. : org.apache.spark.SparkException: Job aborted due to stage failure: Exception while getting task result:

Re: spark.driver.extraClassPath and export SPARK_CLASSPATH

2016-04-13 Thread AlexModestov
I wrote in "spark-defaults.conf" spark.driver.extraClassPath '/dir' or "PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" /.../sparkling-water-1.6.1/bin/pysparkling \ --conf spark.driver.extraClassPath='/.../sqljdbc41.jar' Nothing works -- View this message in context:

An error occurred while calling z:org.apache.spark.sql.execution.EvaluatePython.takeAndServe

2016-04-12 Thread AlexModestov
I get an error while I form a dataframe from the parquet file: Py4JJavaError: An error occurred while calling z:org.apache.spark.sql.execution.EvaluatePython.takeAndServe. : org.apache.spark.SparkException: Job aborted due to stage failure: Exception while getting task result:

spark.driver.extraClassPath and export SPARK_CLASSPATH

2016-04-11 Thread AlexModestov
Hello, I've started to use Spark 1.6.1 before I used Spark 1.5. I included the string export SPARK_CLASSPATH="/SQLDrivers/sqljdbc_4.2/enu/sqljdbc41.jar" when I launched pysparkling and it worked well. But in version 1.6.1 there is an error that it's deprecated and I had to use

Spark demands HiveContext but I use only SqlContext

2016-04-11 Thread AlexModestov
Hello! I work with SqlContext, I create a query to MS Sql Server and get data... Spark says to me that I have to install hive... I have started to use Spark 1.6.1 (before I used Spark 1.5 and I have never heard about this necessity early)... Py4JJavaError: An error occurred while calling

sql functions: row_number, percent_rank, rank,rowNumber

2016-03-10 Thread AlexModestov
Hello all, I try to use some sql functions. My task to renumber rows in DataFrame. I use sql functions but they don't work and I don;t understand why. I would appreciate you help to fix this issue. Thank you! The piece of my code: "from pyspark.sql.functions import row_number, percent_rank, rank,

spark.driver.maxResultSize doesn't work in conf-file

2016-02-20 Thread AlexModestov
I have a string spark.driver.maxResultSize=0 in the spark-defaults.conf. But I get an error: "org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 18 tasks (1070.5 MB) is bigger than spark.driver.maxResultSize (1024.0 MB)" But if I write --conf

an error when I read data from parquet

2016-02-19 Thread AlexModestov
Hello everybody, I use Python API and Scala API. I read data without problem with Python API: "sqlContext = SQLContext(sc) data_full = sqlContext.read.parquet("---")" But when I use Scala: "val sqlContext = new SQLContext(sc) val data_full = sqlContext.read.parquet("---")" I get the error (I

Scala from Jupyter

2016-02-16 Thread AlexModestov
Hello! I want to use Scala from Jupyter (or may be something else if you could recomend anything. I mean an IDE). Does anyone know how I can do this? Thank you! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Scala-from-Jupyter-tp26234.html Sent from the