apache-spark doesn't work correktly with russian alphabet
I want to use Apache Spark for working with text data. There are some Russian symbols but Apache Spark shows me strings which look like as "...\u0413\u041e\u0420\u041e...". What should I do for correcting them. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/apache-spark-doesn-t-work-correktly-with-russian-alphabet-tp28316.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
spark use /tmp directory instead of directory from spark.local.dir
Hello! I want to use another dir instaed of /tmp directory for all stuff... I set spark.local.dir and -Djava.io.tmpdir=/... but I see that Spark uses /tmp for some data... What does Spark do? And what should I do my Spark uses only my directories? Thank you! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-use-tmp-directory-instead-of-directory-from-spark-local-dir-tp28217.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
work with russian letters
Hello everybody, I want to work with DataFrames where some columns have a string type. And there are russian letters. Russian letters are incorrect in the text. Could you help me how I should work with them? Thanks. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/work-with-russian-letters-tp27594.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
GC overhead limit exceeded
I get the error in the apache spark... "spark.driver.memory 60g spark.python.worker.memory 60g spark.master local[*]" The amount of data is about 5Gb, but spark says that "GC overhead limit exceeded". I guess that my conf-file gives enought resources. "16/05/16 15:13:02 WARN NettyRpcEndpointRef: Error sending message [message = Heartbeat(driver,[Lscala.Tuple2;@87576f9,BlockManagerId(driver, localhost, 59407))] in 1 attempts org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [10 seconds]. This timeout is controlled by spark.executor.heartbeatInterval at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:48) at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:63) at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59) at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:33) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76) at org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101) at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:449) at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply$mcV$sp(Executor.scala:470) at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:470) at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:470) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1765) at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:470) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.util.concurrent.TimeoutException: Futures timed out after [10 seconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.result(package.scala:107) at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75) ... 14 more 16/05/16 15:13:02 WARN NettyRpcEnv: Ignored message: HeartbeatResponse(false) 05-16 15:13:26.398 127.0.0.1:54321 2059 #e Thread WARN: Swapping! GC CALLBACK, (K/V:29.74 GB + POJO:16.74 GB + FREE:11.03 GB == MEM_MAX:57.50 GB), desiredKV=7.19 GB OOM! 05-16 15:13:44.528 127.0.0.1:54321 2059 #e Thread WARN: Swapping! GC CALLBACK, (K/V:29.74 GB + POJO:16.86 GB + FREE:10.90 GB == MEM_MAX:57.50 GB), desiredKV=7.19 GB OOM! 05-16 15:13:56.847 127.0.0.1:54321 2059 #e Thread WARN: Swapping! GC CALLBACK, (K/V:29.74 GB + POJO:16.88 GB + FREE:10.88 GB == MEM_MAX:57.50 GB), desiredKV=7.19 GB OOM! 05-16 15:14:10.215 127.0.0.1:54321 2059 #e Thread WARN: Swapping! GC CALLBACK, (K/V:29.74 GB + POJO:16.90 GB + FREE:10.86 GB == MEM_MAX:57.50 GB), desiredKV=7.19 GB OOM! 05-16 15:14:33.622 127.0.0.1:54321 2059 #e Thread WARN: Swapping! GC CALLBACK, (K/V:29.74 GB + POJO:16.91 GB + FREE:10.85 GB == MEM_MAX:57.50 GB), desiredKV=7.19 GB OOM! 05-16 15:14:47.075 127.0.0.1:54321 2059 #e Thread WARN: Swapping! GC CALLBACK, (K/V:29.74 GB + POJO:16.93 GB + FREE:10.84 GB == MEM_MAX:57.50 GB), desiredKV=7.19 GB OOM! 05-16 15:15:10.555 127.0.0.1:54321 2059 #e Thread WARN: Swapping! GC CALLBACK, (K/V:29.74 GB + POJO:16.92 GB + FREE:10.84 GB == MEM_MAX:57.50 GB), desiredKV=7.19 GB OOM! 05-16 15:15:25.520 127.0.0.1:54321 2059 #e Thread WARN: Swapping! GC CALLBACK, (K/V:29.74 GB + POJO:16.93 GB + FREE:10.84 GB == MEM_MAX:57.50 GB), desiredKV=7.19 GB OOM! 05-16 15:15:39.087 127.0.0.1:54321 2059 #e Thread WARN: Swapping! GC CALLBACK, (K/V:29.74 GB + POJO:16.93 GB + FREE:10.84 GB == MEM_MAX:57.50 GB), desiredKV=7.19 GB OOM! Exception in thread "HashSessionScavenger-0" java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.concurrent.ConcurrentHashMap$ValuesView.iterator(ConcurrentHashMap.java:4683) at org.eclipse.jetty.server.session.HashSessionManager.scavenge(HashSessionManager.java:314) at
Re: ML regression - spark context dies without error
Hello, I have the same problem... Sometimes I get the error: "Py4JError: Answer from Java side is empty" Sometimes my code works fine but sometimes not... Did you find why it might come? What was the reason? Thanks. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/ML-regression-spark-context-dies-without-error-tp22633p26938.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Need for advice - performance improvement and out of memory resolution
Hello. I'm sorry but did you find the answer? I have the similar error and I can not solve it... No one answered me... Spark driver dies and I get the error "Answer from Java side is empty". I thought that it is so because I made a mistake this conf-file I use Sparkling Water 1.6.3, Spark 1.6. I use Java Oracle 8 or OpenJDK-7: (every time I get this error when I transform Spark DataFrame into H2O DataFrame. ERROR:py4j.java_gateway:Error while sending or receiving. Traceback (most recent call last): File ".../Spark1.6/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 746, in send_command raise Py4JError("Answer from Java side is empty") Py4JError: Answer from Java side is empty ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server Traceback (most recent call last): File ".../Spark1.6/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 690, in start self.socket.connect((self.address, self.port)) File "/usr/local/anaconda/lib/python2.7/socket.py", line 228, in meth return getattr(self._sock,name)(*args) error: [Errno 111] Connection refused ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server Traceback (most recent call last): My conf-file: spark.serializer org.apache.spark.serializer.KryoSerializer spark.kryoserializer.buffer.max 1500mb spark.driver.memory 65g spark.driver.extraJavaOptions -XX:-PrintGCDetails -XX:PermSize=35480m -XX:-PrintGCTimeStamps -XX:-PrintTenuringDistribution spark.python.worker.memory 65g spark.local.dir /data/spark-tmp spark.ext.h2o.client.log.dir /data/h2o spark.logConf false spark.master local[*] spark.driver.maxResultSize 0 spark.eventLog.enabled True spark.eventLog.dir /data/spark_log In the code I use "persist" data (amount of data is 5.7 GB). I guess that there is enough memory. Could anyone help me? Thanks! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Need-for-advice-performance-improvement-and-out-of-memory-resolution-tp24886p26937.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Error: "Answer from Java side is empty"
I use Sparkling Water 1.6.3, Spark 1.6.I use Java Oracle 8 or OpenJDK-7:(every time I get this error when I transform Spark DataFrame into H2O DataFrame. Spark cluster dies..):ERROR:py4j.java_gateway:Error while sending or receiving.Traceback (most recent call last): File ".../Spark1.6/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 746, in send_commandraise Py4JError("Answer from Java side is empty")Py4JError: Answer from Java side is emptyERROR:py4j.java_gateway:An error occurred while trying to connect to the Java serverTraceback (most recent call last): File ".../Spark1.6/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 690, in startself.socket.connect((self.address, self.port)) File "/usr/local/anaconda/lib/python2.7/socket.py", line 228, in methreturn getattr(self._sock,name)(*args)error: [Errno 111] Connection refusedERROR:py4j.java_gateway:An error occurred while trying to connect to the Java serverTraceback (most recent call last):My conf-file:spark.serializer org.apache.spark.serializer.KryoSerializer spark.kryoserializer.buffer.max 1500mbspark.driver.memory 65gspark.driver.extraJavaOptions -XX:-PrintGCDetails -XX:PermSize=35480m -XX:-PrintGCTimeStamps -XX:-PrintTenuringDistribution spark.python.worker.memory 65gspark.local.dir /data/spark-tmpspark.ext.h2o.client.log.dir /data/h2ospark.logConf falsespark.master local[*]spark.driver.maxResultSize 0spark.eventLog.enabled Truespark.eventLog.dir /data/spark_logIn the code I use "persist" data (amount of data is 5.7 GB).There is nothing in the h2olog-files.I guess that there is enough memory.Could anyone help me?Thanks! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Error-Answer-from-Java-side-is-empty-tp26929.html Sent from the Apache Spark User List mailing list archive at Nabble.com.
SQL Driver
Hello all, I use a string when I'm launching the Sparkling-Water: "--conf spark.driver.extraClassPath='/SQLDrivers/sqljdbc_4.2/enu/sqljdbc41.jar" and I get the error: " --- TypeError Traceback (most recent call last) in () 1 from pysparkling import * > 2 hc = H2OContext(sc).start() /tmp/modestov/spark/work/spark-5695a33c-905d-4af5-a719-88b7be0e0c45/userFiles-77e075c2-41cc-44d6-96fb-a2668b112133/pySparkling-1.6.1-py2.7.egg/pysparkling/context.py in __init__(self, sparkContext) 70 def __init__(self, sparkContext): 71 try: ---> 72 self._do_init(sparkContext) 73 # Hack H2OFrame from h2o package 74 _monkey_patch_H2OFrame(self) /tmp/modestov/spark/work/spark-5695a33c-905d-4af5-a719-88b7be0e0c45/userFiles-77e075c2-41cc-44d6-96fb-a2668b112133/pySparkling-1.6.1-py2.7.egg/pysparkling/context.py in _do_init(self, sparkContext) 94 gw = self._gw 95 ---> 96 self._jhc = jvm.org.apache.spark.h2o.H2OContext.getOrCreate(sc._jsc) 97 self._client_ip = None 98 self._client_port = None TypeError: 'JavaPackage' object is not callable" What does it mean? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SQL-Driver-tp26800.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
error "Py4JJavaError: An error occurred while calling z:org.apache.spark.sql.execution.EvaluatePython.takeAndServe."
I get this error. Who knows what does it mean? Py4JJavaError: An error occurred while calling z:org.apache.spark.sql.execution.EvaluatePython.takeAndServe. : org.apache.spark.SparkException: Job aborted due to stage failure: Exception while getting task result: org.apache.spark.storage.BlockFetchException: Failed to fetch block from 1 locations. Most recent failure cause: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1952) at org.apache.spark.rdd.RDD$$anonfun$reduce$1.apply(RDD.scala:1025) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) at org.apache.spark.rdd.RDD.reduce(RDD.scala:1007) at org.apache.spark.rdd.RDD$$anonfun$takeOrdered$1.apply(RDD.scala:1397) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) at org.apache.spark.rdd.RDD.takeOrdered(RDD.scala:1384) at org.apache.spark.sql.execution.TakeOrderedAndProject.collectData(basicOperators.scala:213) at org.apache.spark.sql.execution.TakeOrderedAndProject.doExecute(basicOperators.scala:223) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) at org.apache.spark.sql.execution.Union$$anonfun$doExecute$1.apply(basicOperators.scala:144) at org.apache.spark.sql.execution.Union$$anonfun$doExecute$1.apply(basicOperators.scala:144) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.immutable.List.foreach(List.scala:318) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.sql.execution.Union.doExecute(basicOperators.scala:144) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130) at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:187) at org.apache.spark.sql.execution.EvaluatePython$$anonfun$takeAndServe$1.apply$mcI$sp(python.scala:126) at org.apache.spark.sql.execution.EvaluatePython$$anonfun$takeAndServe$1.apply(python.scala:124) at org.apache.spark.sql.execution.EvaluatePython$$anonfun$takeAndServe$1.apply(python.scala:124) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:56) at org.apache.spark.sql.DataFrame.withNewExecutionId(DataFrame.scala:2086) at
Re: spark.driver.extraClassPath and export SPARK_CLASSPATH
I wrote in "spark-defaults.conf" spark.driver.extraClassPath '/dir' or "PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook" /.../sparkling-water-1.6.1/bin/pysparkling \ --conf spark.driver.extraClassPath='/.../sqljdbc41.jar' Nothing works -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-driver-extraClassPath-and-export-SPARK-CLASSPATH-tp26740p26774.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
An error occurred while calling z:org.apache.spark.sql.execution.EvaluatePython.takeAndServe
I get an error while I form a dataframe from the parquet file: Py4JJavaError: An error occurred while calling z:org.apache.spark.sql.execution.EvaluatePython.takeAndServe. : org.apache.spark.SparkException: Job aborted due to stage failure: Exception while getting task result: org.apache.spark.storage.BlockFetchException: Failed to fetch block from 1 locations. Most recent failure cause: -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/An-error-occurred-while-calling-z-org-apache-spark-sql-execution-EvaluatePython-takeAndServe-tp26764.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
spark.driver.extraClassPath and export SPARK_CLASSPATH
Hello, I've started to use Spark 1.6.1 before I used Spark 1.5. I included the string export SPARK_CLASSPATH="/SQLDrivers/sqljdbc_4.2/enu/sqljdbc41.jar" when I launched pysparkling and it worked well. But in version 1.6.1 there is an error that it's deprecated and I had to use spark.driver.extraClassPath. OK, there is the string spark.driver.extraClassPath /SQLDrivers/sqljdbc_4.2/enu/sqljdbc41.jar in spark-defaults.conf but Spark says that there is no suitable driver for working with SQL Server. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-driver-extraClassPath-and-export-SPARK-CLASSPATH-tp26740.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Spark demands HiveContext but I use only SqlContext
Hello! I work with SqlContext, I create a query to MS Sql Server and get data... Spark says to me that I have to install hive... I have started to use Spark 1.6.1 (before I used Spark 1.5 and I have never heard about this necessity early)... Py4JJavaError: An error occurred while calling None.org.apache.spark.sql.hive.HiveContext. : java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-demands-HiveContext-but-I-use-only-SqlContext-tp26738.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
sql functions: row_number, percent_rank, rank,rowNumber
Hello all, I try to use some sql functions. My task to renumber rows in DataFrame. I use sql functions but they don't work and I don;t understand why. I would appreciate you help to fix this issue. Thank you! The piece of my code: "from pyspark.sql.functions import row_number, percent_rank, rank, randn,rowNumber res_sorted.select(rowNumber()).head(10)" res_sorted - is a sorted dataframe. The error is: "AnalysisException: u"unresolved operator 'Project ['row_number() AS 'row_number()#2848];"" -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/sql-functions-row-number-percent-rank-rank-rowNumber-tp26448.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
spark.driver.maxResultSize doesn't work in conf-file
I have a string spark.driver.maxResultSize=0 in the spark-defaults.conf. But I get an error: "org.apache.spark.SparkException: Job aborted due to stage failure: Total size of serialized results of 18 tasks (1070.5 MB) is bigger than spark.driver.maxResultSize (1024.0 MB)" But if I write --conf spark.driver.maxResultSize=0 in pyspark-shell it works fine. Could anyone know how to fix it? Thank you -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-driver-maxResultSize-doesn-t-work-in-conf-file-tp26279.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
an error when I read data from parquet
Hello everybody, I use Python API and Scala API. I read data without problem with Python API: "sqlContext = SQLContext(sc) data_full = sqlContext.read.parquet("---")" But when I use Scala: "val sqlContext = new SQLContext(sc) val data_full = sqlContext.read.parquet("---")" I get the error (I use Spark-Notebook may be it is important): "java.lang.ExceptionInInitializerError at sun.misc.Unsafe.ensureClassInitialized(Native Method) at sun.reflect.UnsafeFieldAccessorFactory.newFieldAccessor(UnsafeFieldAccessorFactory.java:43) at sun.reflect.ReflectionFactory.newFieldAccessor(ReflectionFactory.java:140) at java.lang.reflect.Field.acquireFieldAccessor(Field.java:1057) at java.lang.reflect.Field.getFieldAccessor(Field.java:1038) at java.lang.reflect.Field.get(Field.java:379) at notebook.kernel.Repl.getModule$1(Repl.scala:203) at notebook.kernel.Repl.iws$1(Repl.scala:212) at notebook.kernel.Repl.liftedTree1$1(Repl.scala:219) at notebook.kernel.Repl.evaluate(Repl.scala:199) at notebook.client.ReplCalculator$$anonfun$15$$anon$1$$anonfun$29.apply(ReplCalculator.scala:378) at notebook.client.ReplCalculator$$anonfun$15$$anon$1$$anonfun$29.apply(ReplCalculator.scala:375) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:40) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: java.lang.NoSuchMethodException: org.apache.spark.io.SnappyCompressionCodec.(org.apache.spark.SparkConf) at java.lang.Class.getConstructor0(Class.java:2892) at java.lang.Class.getConstructor(Class.java:1723) at org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:71) at org.apache.spark.io.CompressionCodec$.createCodec(CompressionCodec.scala:65) at org.apache.spark.broadcast.TorrentBroadcast.org$apache$spark$broadcast$TorrentBroadcast$$setConf(TorrentBroadcast.scala:73) at org.apache.spark.broadcast.TorrentBroadcast.(TorrentBroadcast.scala:80) at org.apache.spark.broadcast.TorrentBroadcastFactory.newBroadcast(TorrentBroadcastFactory.scala:34) at org.apache.spark.broadcast.BroadcastManager.newBroadcast(BroadcastManager.scala:63) at org.apache.spark.SparkContext.broadcast(SparkContext.scala:1326) at org.apache.spark.sql.execution.datasources.DataSourceStrategy$.apply(DataSourceStrategy.scala:108) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58) at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:396) at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:59) at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:47) at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:45) at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:52) at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:52) at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55) at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55) at org.apache.spark.sql.DataFrame.toJSON(DataFrame.scala:1724) at notebook.front.widgets.DataFrameView$class.notebook$front$widgets$DataFrameView$$json(DataFrame.scala:40) at notebook.front.widgets.DataFrameWidget.notebook$front$widgets$DataFrameView$$json$lzycompute(DataFrame.scala:64) at notebook.front.widgets.DataFrameWidget.notebook$front$widgets$DataFrameView$$json(DataFrame.scala:64) at notebook.front.widgets.DataFrameView$class.$init$(DataFrame.scala:41) at notebook.front.widgets.DataFrameWidget.(DataFrame.scala:69) at notebook.front.ExtraLowPriorityRenderers$dataFrameAsTable$.render(renderer.scala:13) at notebook.front.ExtraLowPriorityRenderers$dataFrameAsTable$.render(renderer.scala:12) at notebook.front.Widget$.fromRenderer(Widget.scala:32) at $line19.$rendered$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$.(:92)
Scala from Jupyter
Hello! I want to use Scala from Jupyter (or may be something else if you could recomend anything. I mean an IDE). Does anyone know how I can do this? Thank you! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Scala-from-Jupyter-tp26234.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org