[GitHub] spark pull request #21977: SPARK-25004: Add spark.executor.pyspark.memory li...

holdenk Fri, 10 Aug 2018 10:49:02 -0700

Github user holdenk commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21977#discussion_r209331503
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/python/AggregateInPandasExec.scala
 ---
    @@ -137,13 +135,12 @@ case class AggregateInPandasExec(
     
           val columnarBatchIter = new ArrowPythonRunner(
             pyFuncs,
    -        bufferSize,
    -        reuseWorker,
             PythonEvalType.SQL_GROUPED_AGG_PANDAS_UDF,
             argOffsets,
             aggInputSchema,
             sessionLocalTimeZone,
    -        pythonRunnerConf).compute(projectedRowIter, context.partitionId(), 
context)
    +        pythonRunnerConf,
    +        sparkContext.conf).compute(projectedRowIter, 
context.partitionId(), context)
    --- End diff --
    
    No, we can't normally. The tests should fail, or we need another test for 
this part of the Arrow code. cc @BryanCutler  (or I've misunderstood 
something). Whatever variables are needed from the conf need to be extracted 
outside of the `map` (since while RDD also has a `conf` it depends on the `sc` 
which is only defined on the driver.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21977: SPARK-25004: Add spark.executor.pyspark.memory li...

Reply via email to