date:20160306

Re: PySpark, spill-related (possibly psutil) issue, throwing an exception '_fill_function() takes exactly 4 arguments (5 given)'

2016-03-06 Thread Shixiong(Ryan) Zhu

Could you rebuild the whole project? I changed the python function
serialization format in https://github.com/apache/spark/pull/11535 to fix a
bug. This exception looks like some place was still using the old codes.

On Sun, Mar 6, 2016 at 6:24 PM, Hyukjin Kwon  wrote:

> Just in case, My python version is 2.7.10.
>
> 2016-03-07 11:19 GMT+09:00 Hyukjin Kwon :
>
>> Hi all,
>>
>> While I am testing some codes in PySpark, I met a weird issue.
>>
>> This works fine at Spark 1.6.0 but it looks it does not for Spark 2.0.0.
>>
>> When I simply run *logData = sc.textFile(path).coalesce(1) *with some
>> big files in stand-alone local mode without HDFS, this simply throws the
>> exception,
>>
>>
>> *_fill_function() takes exactly 4 arguments (5 given)*
>>
>>
>> I firstly wanted to open a Jira for this but feel like it is a too
>> primitive functionality and I felt like I might be doing this wrong.
>>
>>
>>
>> The full error message is below:
>>
>> 16/03/07 11:12:44 INFO rdd.HadoopRDD: Input split:
>> file:/Users/hyukjinkwon/Desktop/workspace/local/spark-local-ade/spark/data/00_REF/2016011900-20160215235900-TROI_STAT_ADE_0.DAT:2415919104+33554432
>> *16/03/07 11:12:44 INFO rdd.HadoopRDD: Input split:
>> file:/Users/hyukjinkwon/Desktop/workspace/local/spark-local-ade/spark/data/00_REF/2016011900-20160215235900-TROI_STAT_ADE_0.DAT:805306368+33554432*
>> *16/03/07 11:12:44 INFO rdd.HadoopRDD: Input split:
>> file:/Users/hyukjinkwon/Desktop/workspace/local/spark-local-ade/spark/data/00_REF/2016011900-20160215235900-TROI_STAT_ADE_0.DAT:0+33554432*
>> *16/03/07 11:12:44 INFO rdd.HadoopRDD: Input split:
>> file:/Users/hyukjinkwon/Desktop/workspace/local/spark-local-ade/spark/data/00_REF/2016011900-20160215235900-TROI_STAT_ADE_0.DAT:1610612736+33554432*
>> *16/03/07 11:12:44 ERROR executor.Executor: Exception in task 2.0 in
>> stage 0.0 (TID 2)*
>> *org.apache.spark.api.python.PythonException: Traceback (most recent call
>> last):*
>> *  File "./python/pyspark/worker.py", line 98, in main*
>> *command = pickleSer._read_with_length(infile)*
>> *  File "./python/pyspark/serializers.py", line 164, in _read_with_length*
>> *return self.loads(obj)*
>> *  File "./python/pyspark/serializers.py", line 422, in loads*
>> *return pickle.loads(obj)*
>> *TypeError: ('_fill_function() takes exactly 4 arguments (5 given)',
>> , (> 0x10612c488>, {'defaultdict': ,
>> 'get_used_memory': , 'pack_long':
>> }, None, {}, 'pyspark.rdd'))*
>>
>> * at
>> org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:168)*
>> * at
>> org.apache.spark.api.python.PythonRunner$$anon$1.(PythonRDD.scala:209)*
>> * at
>> org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:127)*
>> * at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:62)*
>> * at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)*
>> * at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)*
>> * at org.apache.spark.api.python.PairwiseRDD.compute(PythonRDD.scala:349)*
>> * at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)*
>> * at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)*
>> * at
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:77)*
>> * at
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:45)*
>> * at org.apache.spark.scheduler.Task.run(Task.scala:82)*
>> * at
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)*
>> * at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)*
>> * at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)*
>> * at java.lang.Thread.run(Thread.java:745)*
>> *16/03/07 11:12:44 ERROR executor.Executor: Exception in task 3.0 in
>> stage 0.0 (TID 3)*
>> *org.apache.spark.api.python.PythonException: Traceback (most recent call
>> last):*
>> *  File "./python/pyspark/worker.py", line 98, in main*
>> *command = pickleSer._read_with_length(infile)*
>> *  File "./python/pyspark/serializers.py", line 164, in _read_with_length*
>> *return self.loads(obj)*
>> *  File "./python/pyspark/serializers.py", line 422, in loads*
>> *return pickle.loads(obj)*
>> *TypeError: ('_fill_function() takes exactly 4 arguments (5 given)',
>> , (> 0x10612c488>, {'defaultdict': ,
>> 'get_used_memory': , 'pack_long':
>> }, None, {}, 'pyspark.rdd'))*
>>
>>
>> Thanks!
>>
>
>

Re: GraphX optimizations

2016-03-06 Thread Takeshi Yamamuro

Hi,

mapReduceTriplets you said has been removed in master and you need to use a
newer api,
aggregateMessages, instead of it (See SPARK-3936 and SPARK-12995 for
details).
The memory-based shuffling opt. is a topic of not only graphx but also
spark itself.
You can see SPARK-3376 for related discussions.

Thanks,

On Sat, Mar 5, 2016 at 2:53 AM, Khaled Ammar  wrote:

> Hi all,
>
> I wonder if the optimizations mentioned in the GraphX paper (
> https://amplab.cs.berkeley.edu/wp-content/uploads/2014/09/graphx.pdf )
> are currently implemented. In particular, I am looking for mrTriplets
> optimizations and memory-based shuffle.
>
> --
> Thanks,
> -Khaled
>

-- 
---
Takeshi Yamamuro

Re: PySpark, spill-related (possibly psutil) issue, throwing an exception '_fill_function() takes exactly 4 arguments (5 given)'

2016-03-06 Thread Hyukjin Kwon

Just in case, My python version is 2.7.10.

2016-03-07 11:19 GMT+09:00 Hyukjin Kwon :

> Hi all,
>
> While I am testing some codes in PySpark, I met a weird issue.
>
> This works fine at Spark 1.6.0 but it looks it does not for Spark 2.0.0.
>
> When I simply run *logData = sc.textFile(path).coalesce(1) *with some big
> files in stand-alone local mode without HDFS, this simply throws the
> exception,
>
>
> *_fill_function() takes exactly 4 arguments (5 given)*
>
>
> I firstly wanted to open a Jira for this but feel like it is a too
> primitive functionality and I felt like I might be doing this wrong.
>
>
>
> The full error message is below:
>
> 16/03/07 11:12:44 INFO rdd.HadoopRDD: Input split:
> file:/Users/hyukjinkwon/Desktop/workspace/local/spark-local-ade/spark/data/00_REF/2016011900-20160215235900-TROI_STAT_ADE_0.DAT:2415919104+33554432
> *16/03/07 11:12:44 INFO rdd.HadoopRDD: Input split:
> file:/Users/hyukjinkwon/Desktop/workspace/local/spark-local-ade/spark/data/00_REF/2016011900-20160215235900-TROI_STAT_ADE_0.DAT:805306368+33554432*
> *16/03/07 11:12:44 INFO rdd.HadoopRDD: Input split:
> file:/Users/hyukjinkwon/Desktop/workspace/local/spark-local-ade/spark/data/00_REF/2016011900-20160215235900-TROI_STAT_ADE_0.DAT:0+33554432*
> *16/03/07 11:12:44 INFO rdd.HadoopRDD: Input split:
> file:/Users/hyukjinkwon/Desktop/workspace/local/spark-local-ade/spark/data/00_REF/2016011900-20160215235900-TROI_STAT_ADE_0.DAT:1610612736+33554432*
> *16/03/07 11:12:44 ERROR executor.Executor: Exception in task 2.0 in stage
> 0.0 (TID 2)*
> *org.apache.spark.api.python.PythonException: Traceback (most recent call
> last):*
> *  File "./python/pyspark/worker.py", line 98, in main*
> *command = pickleSer._read_with_length(infile)*
> *  File "./python/pyspark/serializers.py", line 164, in _read_with_length*
> *return self.loads(obj)*
> *  File "./python/pyspark/serializers.py", line 422, in loads*
> *return pickle.loads(obj)*
> *TypeError: ('_fill_function() takes exactly 4 arguments (5 given)',
> , ( 0x10612c488>, {'defaultdict': ,
> 'get_used_memory': , 'pack_long':
> }, None, {}, 'pyspark.rdd'))*
>
> * at
> org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:168)*
> * at
> org.apache.spark.api.python.PythonRunner$$anon$1.(PythonRDD.scala:209)*
> * at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:127)*
> * at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:62)*
> * at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)*
> * at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)*
> * at org.apache.spark.api.python.PairwiseRDD.compute(PythonRDD.scala:349)*
> * at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)*
> * at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)*
> * at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:77)*
> * at
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:45)*
> * at org.apache.spark.scheduler.Task.run(Task.scala:82)*
> * at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)*
> * at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)*
> * at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)*
> * at java.lang.Thread.run(Thread.java:745)*
> *16/03/07 11:12:44 ERROR executor.Executor: Exception in task 3.0 in stage
> 0.0 (TID 3)*
> *org.apache.spark.api.python.PythonException: Traceback (most recent call
> last):*
> *  File "./python/pyspark/worker.py", line 98, in main*
> *command = pickleSer._read_with_length(infile)*
> *  File "./python/pyspark/serializers.py", line 164, in _read_with_length*
> *return self.loads(obj)*
> *  File "./python/pyspark/serializers.py", line 422, in loads*
> *return pickle.loads(obj)*
> *TypeError: ('_fill_function() takes exactly 4 arguments (5 given)',
> , ( 0x10612c488>, {'defaultdict': ,
> 'get_used_memory': , 'pack_long':
> }, None, {}, 'pyspark.rdd'))*
>
>
> Thanks!
>

PySpark, spill-related (possibly psutil) issue, throwing an exception '_fill_function() takes exactly 4 arguments (5 given)'

2016-03-06 Thread Hyukjin Kwon

Hi all,

While I am testing some codes in PySpark, I met a weird issue.

This works fine at Spark 1.6.0 but it looks it does not for Spark 2.0.0.

When I simply run *logData = sc.textFile(path).coalesce(1) *with some big
files in stand-alone local mode without HDFS, this simply throws the
exception,


*_fill_function() takes exactly 4 arguments (5 given)*


I firstly wanted to open a Jira for this but feel like it is a too
primitive functionality and I felt like I might be doing this wrong.



The full error message is below:

16/03/07 11:12:44 INFO rdd.HadoopRDD: Input split:
file:/Users/hyukjinkwon/Desktop/workspace/local/spark-local-ade/spark/data/00_REF/2016011900-20160215235900-TROI_STAT_ADE_0.DAT:2415919104+33554432
*16/03/07 11:12:44 INFO rdd.HadoopRDD: Input split:
file:/Users/hyukjinkwon/Desktop/workspace/local/spark-local-ade/spark/data/00_REF/2016011900-20160215235900-TROI_STAT_ADE_0.DAT:805306368+33554432*
*16/03/07 11:12:44 INFO rdd.HadoopRDD: Input split:
file:/Users/hyukjinkwon/Desktop/workspace/local/spark-local-ade/spark/data/00_REF/2016011900-20160215235900-TROI_STAT_ADE_0.DAT:0+33554432*
*16/03/07 11:12:44 INFO rdd.HadoopRDD: Input split:
file:/Users/hyukjinkwon/Desktop/workspace/local/spark-local-ade/spark/data/00_REF/2016011900-20160215235900-TROI_STAT_ADE_0.DAT:1610612736+33554432*
*16/03/07 11:12:44 ERROR executor.Executor: Exception in task 2.0 in stage
0.0 (TID 2)*
*org.apache.spark.api.python.PythonException: Traceback (most recent call
last):*
*  File "./python/pyspark/worker.py", line 98, in main*
*command = pickleSer._read_with_length(infile)*
*  File "./python/pyspark/serializers.py", line 164, in _read_with_length*
*return self.loads(obj)*
*  File "./python/pyspark/serializers.py", line 422, in loads*
*return pickle.loads(obj)*
*TypeError: ('_fill_function() takes exactly 4 arguments (5 given)',
, (, {'defaultdict': ,
'get_used_memory': , 'pack_long':
}, None, {}, 'pyspark.rdd'))*

* at
org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:168)*
* at
org.apache.spark.api.python.PythonRunner$$anon$1.(PythonRDD.scala:209)*
* at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:127)*
* at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:62)*
* at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)*
* at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)*
* at org.apache.spark.api.python.PairwiseRDD.compute(PythonRDD.scala:349)*
* at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:313)*
* at org.apache.spark.rdd.RDD.iterator(RDD.scala:277)*
* at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:77)*
* at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:45)*
* at org.apache.spark.scheduler.Task.run(Task.scala:82)*
* at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)*
* at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)*
* at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)*
* at java.lang.Thread.run(Thread.java:745)*
*16/03/07 11:12:44 ERROR executor.Executor: Exception in task 3.0 in stage
0.0 (TID 3)*
*org.apache.spark.api.python.PythonException: Traceback (most recent call
last):*
*  File "./python/pyspark/worker.py", line 98, in main*
*command = pickleSer._read_with_length(infile)*
*  File "./python/pyspark/serializers.py", line 164, in _read_with_length*
*return self.loads(obj)*
*  File "./python/pyspark/serializers.py", line 422, in loads*
*return pickle.loads(obj)*
*TypeError: ('_fill_function() takes exactly 4 arguments (5 given)',
, (, {'defaultdict': ,
'get_used_memory': , 'pack_long':
}, None, {}, 'pyspark.rdd'))*


Thanks!

Spark Custom Partitioner not picked

2016-03-06 Thread Prabhu Joseph

Hi All,

When i am submitting a spark job on YARN with Custom Partitioner, it is
not picked by Executors. Executors still using the default HashPartitioner.
I added logs into both HashPartitioner (org/apache/spark/Partitioner.scala)
and Custom Partitioner. The completed executor logs shows HashPartitioner.

Below is the Spark application code with Custom Partitioner and the log
line which is added into HashPartitioner class of Partition.scala

 
log.info("HashPartitioner="+key+"---"+numPartitions+""+Utils.nonNegativeMod(key.hashCode,
numPartitions))

The Executor logs has

16/03/06 15:20:27 INFO spark.HashPartitioner: HashPartitioner=INFO---42
16/03/06 15:20:27 INFO spark.HashPartitioner: HashPartitioner=INFO---42


How to make sure, the executors are picking the right partitioner.



*Code:*
package org.apache.spark

class ExactPartitioner(partitions: Int) extends Partitioner with Logging{

  def numPartitions: Int = partitions

  def getPartition(key: Any): Int = {

*   log.info ("ExactPartitioner="+key)*

   key match{
   case "INFO" => 0
   case "DEBUG" => 1
   case "ERROR" => 2
   case "WARN" => 3
   case "FATAL" => 4
   }
  }
}

object GroupByCLDB {

def main(args: Array[String]) {

val logFile = "/DATA"

val sparkConf = new SparkConf().setAppName("GroupBy")
sparkConf.set("spark.executor.memory","4g");
sparkConf.set("spark.executor.cores","2");
sparkConf.set("spark.executor.instances","2");

val sc = new SparkContext(sparkConf)
val logData = sc.textFile(logFile)


case class LogClass(one:String,two:String)

def parse(line: String) = {
  val pieces = line.split(' ')
  val level = pieces(2).toString
  val one = pieces(0).toString
  val two = pieces(1).toString
  (level,LogClass(one,two))
  }

val output = logData.map(x => parse(x))

*val partitioned = output.partitionBy(new ExactPartitioner(5)).persist()val
groups = partitioned.groupByKey(new ExactPartitioner(5))*
groups.count()

output.partitions.size
partitioned.partitions.size

}
}



Thanks,
Prabhu Joseph

Re: [VOTE] Release Apache Spark 1.6.1 (RC1)

2016-03-06 Thread Egor Pahomov

+1

Spark ODBC server is fine, SQL is fine.

2016-03-03 12:09 GMT-08:00 Yin Yang :

> Skipping docker tests, the rest are green:
>
> [INFO] Spark Project External Kafka ... SUCCESS [01:28
> min]
> [INFO] Spark Project Examples . SUCCESS [02:59
> min]
> [INFO] Spark Project External Kafka Assembly .. SUCCESS [
> 11.680 s]
> [INFO]
> 
> [INFO] BUILD SUCCESS
> [INFO]
> 
> [INFO] Total time: 02:16 h
> [INFO] Finished at: 2016-03-03T11:17:07-08:00
> [INFO] Final Memory: 152M/4062M
>
> On Thu, Mar 3, 2016 at 8:55 AM, Yin Yang  wrote:
>
>> When I ran test suite using the following command:
>>
>> build/mvn clean -Phive -Phive-thriftserver -Pyarn -Phadoop-2.6
>> -Dhadoop.version=2.7.0 package
>>
>> I got failure in Spark Project Docker Integration Tests :
>>
>> 16/03/02 17:36:46 INFO RemoteActorRefProvider$RemotingTerminator: Remote
>> daemon shut down; proceeding with flushing remote transports.
>> ^[[31m*** RUN ABORTED ***^[[0m
>> ^[[31m  com.spotify.docker.client.DockerException:
>> java.util.concurrent.ExecutionException:
>> com.spotify.docker.client.shaded.javax.ws.rs.ProcessingException: java.io.
>>IOException: No such file or directory^[[0m
>> ^[[31m  at
>> com.spotify.docker.client.DefaultDockerClient.propagate(DefaultDockerClient.java:1141)^[[0m
>> ^[[31m  at
>> com.spotify.docker.client.DefaultDockerClient.request(DefaultDockerClient.java:1082)^[[0m
>> ^[[31m  at
>> com.spotify.docker.client.DefaultDockerClient.ping(DefaultDockerClient.java:281)^[[0m
>> ^[[31m  at
>> org.apache.spark.sql.jdbc.DockerJDBCIntegrationSuite.beforeAll(DockerJDBCIntegrationSuite.scala:76)^[[0m
>> ^[[31m  at
>> org.scalatest.BeforeAndAfterAll$class.beforeAll(BeforeAndAfterAll.scala:187)^[[0m
>> ^[[31m  at
>> org.apache.spark.sql.jdbc.DockerJDBCIntegrationSuite.beforeAll(DockerJDBCIntegrationSuite.scala:58)^[[0m
>> ^[[31m  at
>> org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:253)^[[0m
>> ^[[31m  at
>> org.apache.spark.sql.jdbc.DockerJDBCIntegrationSuite.run(DockerJDBCIntegrationSuite.scala:58)^[[0m
>> ^[[31m  at
>> org.scalatest.Suite$class.callExecuteOnSuite$1(Suite.scala:1492)^[[0m
>> ^[[31m  at
>> org.scalatest.Suite$$anonfun$runNestedSuites$1.apply(Suite.scala:1528)^[[0m
>> ^[[31m  ...^[[0m
>> ^[[31m  Cause: java.util.concurrent.ExecutionException:
>> com.spotify.docker.client.shaded.javax.ws.rs.ProcessingException:
>> java.io.IOException: No such file or directory^[[0m
>> ^[[31m  at
>> jersey.repackaged.com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)^[[0m
>> ^[[31m  at
>> jersey.repackaged.com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)^[[0m
>> ^[[31m  at
>> jersey.repackaged.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)^[[0m
>> ^[[31m  at
>> com.spotify.docker.client.DefaultDockerClient.request(DefaultDockerClient.java:1080)^[[0m
>> ^[[31m  at
>> com.spotify.docker.client.DefaultDockerClient.ping(DefaultDockerClient.java:281)^[[0m
>> ^[[31m  at
>> org.apache.spark.sql.jdbc.DockerJDBCIntegrationSuite.beforeAll(DockerJDBCIntegrationSuite.scala:76)^[[0m
>> ^[[31m  at
>> org.scalatest.BeforeAndAfterAll$class.beforeAll(BeforeAndAfterAll.scala:187)^[[0m
>> ^[[31m  at
>> org.apache.spark.sql.jdbc.DockerJDBCIntegrationSuite.beforeAll(DockerJDBCIntegrationSuite.scala:58)^[[0m
>> ^[[31m  at
>> org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:253)^[[0m
>> ^[[31m  at
>> org.apache.spark.sql.jdbc.DockerJDBCIntegrationSuite.run(DockerJDBCIntegrationSuite.scala:58)^[[0m
>> ^[[31m  ...^[[0m
>> ^[[31m  Cause:
>> com.spotify.docker.client.shaded.javax.ws.rs.ProcessingException:
>> java.io.IOException: No such file or directory^[[0m
>> ^[[31m  at
>> org.glassfish.jersey.apache.connector.ApacheConnector.apply(ApacheConnector.java:481)^[[0m
>> ^[[31m  at
>> org.glassfish.jersey.apache.connector.ApacheConnector$1.run(ApacheConnector.java:491)^[[0m
>> ^[[31m  at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)^[[0m
>> ^[[31m  at java.util.concurrent.FutureTask.run(FutureTask.java:262)^[[0m
>> ^[[31m  at
>> jersey.repackaged.com.google.common.util.concurrent.MoreExecutors$DirectExecutorService.execute(MoreExecutors.java:299)^[[0m
>> ^[[31m  at
>> java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:110)^[[0m
>> ^[[31m  at
>> jersey.repackaged.com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:50)^[[0m
>> ^[[31m  at
>> jersey.repackaged.com.google.common.util.concurrent.AbstractListeningExecutorService.submit(AbstractListeningExecutorService.java:37)^[[0m
>> ^[[31m  at
>>

Re: Fwd: spark master ui to proxy app and worker ui

2016-03-06 Thread Gurvinder Singh

I wonder if anyone got any feedback on it. I can look into implement it
but would like to know if such a functionality can be merged into master
back. If yes then please let me know and point me to the direction to
get started.

Regards,
Gurvinder
On 03/04/2016 09:25 AM, Gurvinder Singh wrote:
> Forwarding to development mailing list, as it might be more relevant
> here to ask for it. I am wondering if I miss something in the
> documentation that it might be possible already. If yes then please
> point me to the documentation as how to achieve it. If no, then would it
> make sense to implement it ?
> 
> Thanks,
> Gurvinder
> 
> 
>  Forwarded Message 
> Subject: spark master ui to proxy app and worker ui
> Date: Thu, 3 Mar 2016 20:12:07 +0100
> From: Gurvinder Singh 
> To: user 
> 
> Hi,
> 
> I am wondering if it is possible for the spark standalone master UI to
> proxy app/driver UI and worker UI. The reason for this is that currently
> if you want to access UI of driver and worker to see logs, you need to
> have access to their IP:port which makes it harder to open up from
> networking point of view. So operationally it makes life easier if
> master can simply proxy those connections and allow access both app and
> worker UI details from master UI itself.
> 
> Master does not need to have content stream to it all the time, only
> when user wants to access contents from other UIs then it simply proxy
> the request/response during that duration. Thus master will not have to
> incur extra load all the time.
> 
> Thanks,
> Gurvinder
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 
> 
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: PySpark, spill-related (possibly psutil) issue, throwing an exception '_fill_function() takes exactly 4 arguments (5 given)'

Re: GraphX optimizations

Re: PySpark, spill-related (possibly psutil) issue, throwing an exception '_fill_function() takes exactly 4 arguments (5 given)'

PySpark, spill-related (possibly psutil) issue, throwing an exception '_fill_function() takes exactly 4 arguments (5 given)'

Spark Custom Partitioner not picked

Re: [VOTE] Release Apache Spark 1.6.1 (RC1)

Re: Fwd: spark master ui to proxy app and worker ui

7 matches

Site Navigation

Mail list logo

Footer information