[ https://issues.apache.org/jira/browse/SPARK-32061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
zhengruifeng resolved SPARK-32061. ---------------------------------- Resolution: Resolved > potential regression if use memoryUsage instead of numRows > ---------------------------------------------------------- > > Key: SPARK-32061 > URL: https://issues.apache.org/jira/browse/SPARK-32061 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark > Affects Versions: 3.1.0 > Reporter: zhengruifeng > Priority: Major > > 1, if the `memoryUsage` is improperly set, for example, too small to store a > instance; > 2, the blockify+GMM reuse two matrices whose shape is related to current > blockSize: > {code:java} > @transient private lazy val auxiliaryProbMat = DenseMatrix.zeros(blockSize, k) > @transient private lazy val auxiliaryPDFMat = DenseMatrix.zeros(blockSize, > numFeatures) {code} > When implementing blockify+GMM, I found that if I do not pre-allocate those > matrices, there will be seriously regression (maybe 3~4 slower, I fogot the > detailed numbers); > 3, in MLP, three pre-allocated objects are also related to numRows: > {code:java} > if (ones == null || ones.length != delta.cols) ones = > BDV.ones[Double](delta.cols) > // TODO: allocate outputs as one big array and then create BDMs from it > if (outputs == null || outputs(0).cols != currentBatchSize) { > ... > // TODO: allocate deltas as one big array and then create BDMs from it > if (deltas == null || deltas(0).cols != currentBatchSize) { > deltas = new Array[BDM[Double]](layerModels.length) > ... {code} > I am not very familiar with the impl of MLP and failed to find some related > document about this pro-allocation. But I guess there maybe regression if we > disable this pro-allocation, since those objects look relatively big. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org