Hi Qifan, Any progress here ..?. We just want to make sure that the script could run a dataset whose memory is greater than that of driver.
If it is not scaling, it shows some error. > When I run the script, the cholesky or the inv > is executed completely on the driver, and nothing shows up on Spark UI. > Is that the expected behavior? > This I'm not sure. Thanks, Janardhan On Mon, Apr 23, 2018 at 5:37 AM, Qifan Pu <qifan...@gmail.com> wrote: > and everything before (e.g., I generate the matrix using another DML) was > indeed run by Spark and shows up on the UI. > > On Sun, Apr 22, 2018 at 5:05 PM, Qifan Pu <qifan...@gmail.com> wrote: > > > Thanks Jeremy and Matthias. When I run the script, the cholesky or the > inv > > is executed completely on the driver, and nothing shows up on Spark UI. > > Is that the expected behavior? > > > > On Sun, Apr 22, 2018 at 3:34 PM, Jeremy Nilmeier <nilme...@us.ibm.com> > > wrote: > > > >> Yes, I also spoke with Sasha about this some time last year. Thanks for > >> following up. > >> > >> Cheers, J > >> > >> > >> Jerome Nilmeier, PhD > >> Data Scientist and Engineer > >> IBM Spark Technology Center > >> http://www.spark.tc/ > >> > >> > >> > >> ----- Original message ----- > >> From: Matthias Boehm <mboe...@gmail.com> > >> To: dev@systemml.apache.org > >> Cc: Qifan Pu <qifan...@gmail.com>, Jeremy Nilmeier <nilme...@us.ibm.com > > > >> Subject: Re: distributed cholesky on systemml > >> Date: Sun, Apr 22, 2018 2:41 PM > >> > >> thanks for the context Jeremy - that helps. I also had an offline > >> conversion with Sasha and he pointed me to a script that does exactly > >> that (iterative invert_lower_triangular) combined with a parfor over > >> independent blocks. We'll merge these scripts soon and I'll reach out > >> individually as necessary. Thanks everybody for now. > >> > >> Regards, > >> Matthias > >> > >> On Sun, Apr 22, 2018 at 12:40 PM, Jeremy Nilmeier <nilme...@us.ibm.com> > >> wrote: > >> > This may be a duplicate...it was bounced from the dev list. > >> > > >> > I think that scalable triangular inverse will also have similar > >> properties, > >> > in that there is a sequential approach if it uses back substitution. > >> > > >> > For most of these algorithms (LU, Cholesky, QR), they are inherently > >> > sequential, and the focus of the work is on minimizing interprocess > >> > communication during the operations, which may explain why there was > >> only > >> > limited interest in pursuing this further. > >> > > >> > I had originally recommended that the recursive algorithms be > rewritten > >> as > >> > iterative algorithms (and in fact provided an example of the LU in > >> iterative > >> > form), which would make the counting of operations more transparent, > as > >> well > >> > as revealing possible parallelization points. > >> > > >> > Cheers, J > >> > Jerome Nilmeier, PhD > >> > Data Scientist and Engineer > >> > IBM Spark Technology Center > >> > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.spar > >> k.tc_&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=3mYOfURw_FSirAnoSv > >> 2pWvLSi1psso4F9RdGjEWL6yc&m=VIdNVaIRvibBlaNVAOXLKmxXf7ma-EXr > >> LWbjMd9Bmgo&s=YktpBBbqor3DKzS90Ah75BF6NBYtE4RauITF7QaL87g&e= > >> > >> > > >> > > >> > > >> > ----- Original message ----- > >> > From: Matthias Boehm <mboe...@gmail.com> > >> > To: dev@systemml.apache.org > >> > Cc: Qifan Pu <qifan...@gmail.com> > >> > Subject: Re: distributed cholesky on systemml > >> > Date: Sun, Apr 22, 2018 1:21 AM > >> > > >> > sure no problem - thanks again for catching this issue that was hidden > >> > for a while. > >> > > >> > Yes, the same depth-first characteristic applies to the Cholesky > >> > function as well. In contrast to U_triangular_inv, however, there are > >> > data dependencies between the blocks per level (at least in the > >> > current algorithm formulation), which means we cannot use the approach > >> > I described for U_triangular_inv. > >> > > >> > L11 = Cholesky(A11, nb) > >> > A22 = ... U_triangular_inv(t(L11)) > >> > L22 = Cholesky(A22, nb) > >> > > >> > However, note that there are much fewer calls to Cholesky due to the > >> > switch to the builtin cholesky according to the given min block size. > >> > For example, in our new test for dimensions 1362 x 1362 and min size > >> > of 200, we call Cholesky 15 times but U_triangular_inv 2539 times. > >> > > >> > For sufficiently large min block size this might be ok for Cholesky, > >> > because each level also does a number of matrix multiplies that will > >> > exploit the available parallelism of your cluster. In that regard. you > >> > might want to experiment with different block sizes and driver memory > >> > budgets. If I get a chance, I will also run a number of experiments > >> > and see if we can rewrite these scripts. > >> > > >> > Regards, > >> > Matthias > >> > > >> > On Sun, Apr 22, 2018 at 12:48 AM, Qifan Pu <qifan...@gmail.com> > wrote: > >> >> Matthias, > >> >> > >> >> Thanks so much for taking time to fix. Really appreciated it. > >> >> Does the same reasoning apply to the cholesky script? The recursive > >> >> approach > >> >> also looks inherently sequential. > >> >> > >> >> Best, > >> >> Qifan > >> >> > >> >> On Sat, Apr 21, 2018 at 11:39 PM, Matthias Boehm <mboe...@gmail.com> > >> >> wrote: > >> >>> > >> >>> just as a quick update: this issue has now been fixed in SystemML > >> >>> master - it was essentially a missing guard for recursive functions > >> >>> when checking for unary size-preserving functions during > >> >>> inter-procedural analysis (IPA). > >> >>> > >> >>> However, while working with this recursive cholesky function I came > to > >> >>> the conclusion that it may need some rework. The current top-down, > >> >>> depth-first, approach is inherently sequential. This is partially > >> >>> unnecessary because for the used recursive function U_triangular_inv > >> >>> (which is called many more times than cholesky), blocks per level > are > >> >>> independent. Therefore, we should look into a bottom-up, > breadth-first > >> >>> approach to parallelize over the blocks in each level, which could > be > >> >>> done via parfor at script level. > >> >>> > >> >>> Regards, > >> >>> Matthias > >> >>> > >> >>> On Sat, Apr 21, 2018 at 6:59 PM, Matthias Boehm <mboe...@gmail.com> > >> >>> wrote: > >> >>> > thanks for catching this - I just ran a toy example and this seems > >> to > >> >>> > be a rewrite issue (there are specific right indexing rewrites > that > >> >>> > collapse U[1:k,1:k] and U[1:k,k+1:n] into a single access to U > which > >> >>> > helps for large distributed matrices). As a workaround, you can > set > >> >>> > "sysml.optlevel" to 1 (instead of default 2, where 1 disables all > >> >>> > rewrites), which worked fine for me. I'll fix this later today. > Also > >> >>> > I'll fix the naming from "Choleskey" to "Cholesky". Thanks again. > >> >>> > > >> >>> > Regards, > >> >>> > Matthias > >> >>> > > >> >>> > > >> >>> > On Sat, Apr 21, 2018 at 6:28 PM, Qifan Pu <qifan...@gmail.com> > >> wrote: > >> >>> >> Hi Matthias, > >> >>> >> > >> >>> >> Thanks for the fast response and detailed information. This is > >> really > >> >>> >> helpful. > >> >>> >> > >> >>> >> I just tried to run it, and was tracing down a indexing bug that > >> can > >> >>> >> be > >> >>> >> repeated by simply running the test script of triangle solve[1] > >> >>> >> Caused by: org.apache.sysml.runtime.DMLRuntimeException: Invalid > >> >>> >> values > >> >>> >> for > >> >>> >> matrix indexing: [1667:3333,1:1666] must be within matrix > >> dimensions > >> >>> >> [1000,1000] > >> >>> >> > >> >>> >> > >> >>> >> Am I missing some configuration here? > >> >>> >> > >> >>> >> > >> >>> >> [1] > >> >>> >> > >> >>> >> > >> >>> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__github. > >> com_apache_systemml_blob_master_scripts_staging_scalable- > >> 5Flinalg_test_test-5Ftriangular-5Finv.dml&d=DwIBaQ&c=jf_ > >> iaSHvJObTbx-siA1ZOg&r=3mYOfURw_FSirAnoSv2pWvLSi1psso > >> 4F9RdGjEWL6yc&m=FvqDr_AKzY5EAD_GAXIJoot0Z09NtMUt8kLS > >> hXcJxqQ&s=zIEgt74yeZzCTqvLCgV_0J8ECApG541uUlbaGMcK8bs&e= > >> >>> >> > >> >>> >> > >> >>> >> Best, > >> >>> >> Qifan > >> >>> >> > >> >>> >> > >> >>> >> On Sat, Apr 21, 2018 at 4:06 PM, Matthias Boehm < > mboe...@gmail.com > >> > > >> >>> >> wrote: > >> >>> >>> > >> >>> >>> Hi Qifan, > >> >>> >>> > >> >>> >>> thanks for your feedback. You're right, the builtin functions > >> >>> >>> cholesky, inverse, eigen, solve, svd, qr, and lu are currently > >> only > >> >>> >>> supported as single-node operations because they're still > >> implemented > >> >>> >>> via Apache commons.math. > >> >>> >>> > >> >>> >>> However, there is an experimental script for distributed > cholesky > >> [1] > >> >>> >>> which uses a recursive approach (with operations that allow for > >> >>> >>> automatic distributed computation) for matrices larger than a > >> >>> >>> user-defined block size. Once blocks become small enough, we use > >> >>> >>> again > >> >>> >>> the builtin cholesky. Graduating this script would require a > >> broader > >> >>> >>> set of experiments (and potential improvements) but it simply > did > >> not > >> >>> >>> have the highest priority so far. You might want to give it a > try > >> >>> >>> though. > >> >>> >>> > >> >>> >>> Thanks again for your feedback - we'll consider a higher > priority > >> for > >> >>> >>> these distributed operations when discussing the roadmap for the > >> next > >> >>> >>> releases. > >> >>> >>> > >> >>> >>> [1] > >> >>> >>> > >> >>> >>> > >> >>> >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github. > >> com_apache_systemml_blob_master_scripts_staging_scalable- > >> 5Flinalg_cholesky.dml&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZO > >> g&r=3mYOfURw_FSirAnoSv2pWvLSi1psso4F9RdGjEWL6yc&m=FvqDr_ > >> AKzY5EAD_GAXIJoot0Z09NtMUt8kLShXcJxqQ&s=Yrj4GGcTlpZGRw34RoON_oO6-xDUti > >> IEUcO7-qIOyoc&e= > >> >>> >>> > >> >>> >>> Regards, > >> >>> >>> Matthias > >> >>> >>> > >> >>> >>> On Sat, Apr 21, 2018 at 2:15 PM, Qifan Pu <qifan...@gmail.com> > >> wrote: > >> >>> >>> > Hi, > >> >>> >>> > > >> >>> >>> > I would love to do distributed cholesky on large matrix with > >> >>> >>> > SystemML. I > >> >>> >>> > found two related jiras (SYSTEMML-1213, SYSTEMML-1163), but > >> AFAIK, > >> >>> >>> > this > >> >>> >>> > is > >> >>> >>> > currently not implemented? I just wanted to check. > >> >>> >>> > > >> >>> >>> > Best, > >> >>> >>> > Qifan > >> >>> >> > >> >>> >> > >> >> > >> >> > >> > > >> > > >> > > >> > > >> > >> > >> > >> > >> > >> > > >