Re: distributed cholesky on systemml

Janardhan Wed, 09 May 2018 01:51:08 -0700

Hi Qifan,

Any progress here ..?. We just want to make sure that the script could run
a dataset whose memory is greater than that of driver.


If it is not scaling, it shows some error.

> When I run the script, the cholesky or the inv
> is executed completely on the driver, and nothing shows up on Spark UI.
> Is that the expected behavior?
>
This I'm not sure.

Thanks,
Janardhan

On Mon, Apr 23, 2018 at 5:37 AM, Qifan Pu <qifan...@gmail.com> wrote:

> and everything before (e.g., I generate the matrix using another DML) was
> indeed run by Spark and shows up on the UI.
>
> On Sun, Apr 22, 2018 at 5:05 PM, Qifan Pu <qifan...@gmail.com> wrote:
>
> > Thanks Jeremy and Matthias. When I run the script, the cholesky or the
> inv
> > is executed completely on the driver, and nothing shows up on Spark UI.
> > Is that the expected behavior?
> >
> > On Sun, Apr 22, 2018 at 3:34 PM, Jeremy Nilmeier <nilme...@us.ibm.com>
> > wrote:
> >
> >> Yes, I also spoke with Sasha about this some time last year.  Thanks for
> >> following up.
> >>
> >> Cheers, J
> >>
> >>
> >> Jerome Nilmeier, PhD
> >> Data Scientist and Engineer
> >> IBM Spark Technology Center
> >> http://www.spark.tc/
> >>
> >>
> >>
> >> ----- Original message -----
> >> From: Matthias Boehm <mboe...@gmail.com>
> >> To: dev@systemml.apache.org
> >> Cc: Qifan Pu <qifan...@gmail.com>, Jeremy Nilmeier <nilme...@us.ibm.com
> >
> >> Subject: Re: distributed cholesky on systemml
> >> Date: Sun, Apr 22, 2018 2:41 PM
> >>
> >> thanks for the context Jeremy - that helps. I also had an offline
> >> conversion with Sasha and he pointed me to a script that does exactly
> >> that (iterative invert_lower_triangular) combined with a parfor over
> >> independent blocks. We'll merge these scripts soon and I'll reach out
> >> individually as necessary. Thanks everybody for now.
> >>
> >> Regards,
> >> Matthias
> >>
> >> On Sun, Apr 22, 2018 at 12:40 PM, Jeremy Nilmeier <nilme...@us.ibm.com>
> >> wrote:
> >> > This may be a duplicate...it was bounced from the dev list.
> >> >
> >> > I think that scalable triangular inverse will also have similar
> >> properties,
> >> > in that there is a sequential approach if it uses back substitution.
> >> >
> >> > For most of these algorithms (LU, Cholesky, QR), they are inherently
> >> > sequential, and the focus of the work is on minimizing interprocess
> >> > communication during the operations, which may explain why there was
> >> only
> >> > limited interest in pursuing this further.
> >> >
> >> > I had originally recommended that the recursive algorithms be
> rewritten
> >> as
> >> > iterative algorithms (and in fact provided an example of the LU in
> >> iterative
> >> > form), which would make the counting of operations more transparent,
> as
> >> well
> >> > as revealing possible parallelization points.
> >> >
> >> > Cheers, J
> >> > Jerome Nilmeier, PhD
> >> > Data Scientist and Engineer
> >> > IBM Spark Technology Center
> >> > https://urldefense.proofpoint.com/v2/url?u=http-3A__www.spar
> >> k.tc_&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=3mYOfURw_FSirAnoSv
> >> 2pWvLSi1psso4F9RdGjEWL6yc&m=VIdNVaIRvibBlaNVAOXLKmxXf7ma-EXr
> >> LWbjMd9Bmgo&s=YktpBBbqor3DKzS90Ah75BF6NBYtE4RauITF7QaL87g&e=
> >>
> >> >
> >> >
> >> >
> >> > ----- Original message -----
> >> > From: Matthias Boehm <mboe...@gmail.com>
> >> > To: dev@systemml.apache.org
> >> > Cc: Qifan Pu <qifan...@gmail.com>
> >> > Subject: Re: distributed cholesky on systemml
> >> > Date: Sun, Apr 22, 2018 1:21 AM
> >> >
> >> > sure no problem - thanks again for catching this issue that was hidden
> >> > for a while.
> >> >
> >> > Yes, the same depth-first characteristic applies to the Cholesky
> >> > function as well. In contrast to U_triangular_inv, however, there are
> >> > data dependencies between the blocks per level (at least in the
> >> > current algorithm formulation), which means we cannot use the approach
> >> > I described for U_triangular_inv.
> >> >
> >> > L11 = Cholesky(A11, nb)
> >> > A22 = ... U_triangular_inv(t(L11))
> >> > L22 = Cholesky(A22, nb)
> >> >
> >> > However, note that there are much fewer calls to Cholesky due to the
> >> > switch to the builtin cholesky according to the given min block size.
> >> > For example, in our new test for dimensions 1362 x 1362 and min size
> >> > of 200, we call Cholesky 15 times but U_triangular_inv 2539 times.
> >> >
> >> > For sufficiently large min block size this might be ok for Cholesky,
> >> > because each level also does a number of matrix multiplies that will
> >> > exploit the available parallelism of your cluster. In that regard. you
> >> > might want to experiment with different block sizes and driver memory
> >> > budgets. If I get a chance, I will also run a number of experiments
> >> > and see if we can rewrite these scripts.
> >> >
> >> > Regards,
> >> > Matthias
> >> >
> >> > On Sun, Apr 22, 2018 at 12:48 AM, Qifan Pu <qifan...@gmail.com>
> wrote:
> >> >> Matthias,
> >> >>
> >> >> Thanks so much for taking time to fix. Really appreciated it.
> >> >> Does the same reasoning apply to the cholesky script? The recursive
> >> >> approach
> >> >> also looks inherently sequential.
> >> >>
> >> >> Best,
> >> >> Qifan
> >> >>
> >> >> On Sat, Apr 21, 2018 at 11:39 PM, Matthias Boehm <mboe...@gmail.com>
> >> >> wrote:
> >> >>>
> >> >>> just as a quick update: this issue has now been fixed in SystemML
> >> >>> master - it was essentially a missing guard for recursive functions
> >> >>> when checking for unary size-preserving functions during
> >> >>> inter-procedural analysis (IPA).
> >> >>>
> >> >>> However, while working with this recursive cholesky function I came
> to
> >> >>> the conclusion that it may need some rework. The current top-down,
> >> >>> depth-first, approach is inherently sequential. This is partially
> >> >>> unnecessary because for the used recursive function U_triangular_inv
> >> >>> (which is called many more times than cholesky), blocks per level
> are
> >> >>> independent. Therefore, we should look into a bottom-up,
> breadth-first
> >> >>> approach to parallelize over the blocks in each level, which could
> be
> >> >>> done via parfor at script level.
> >> >>>
> >> >>> Regards,
> >> >>> Matthias
> >> >>>
> >> >>> On Sat, Apr 21, 2018 at 6:59 PM, Matthias Boehm <mboe...@gmail.com>
> >> >>> wrote:
> >> >>> > thanks for catching this - I just ran a toy example and this seems
> >> to
> >> >>> > be a rewrite issue (there are specific right indexing rewrites
> that
> >> >>> > collapse U[1:k,1:k] and U[1:k,k+1:n] into a single access to U
> which
> >> >>> > helps for large distributed matrices). As a workaround, you can
> set
> >> >>> > "sysml.optlevel" to 1 (instead of default 2, where 1 disables all
> >> >>> > rewrites), which worked fine for me. I'll fix this later today.
> Also
> >> >>> > I'll fix the naming from "Choleskey" to "Cholesky". Thanks again.
> >> >>> >
> >> >>> > Regards,
> >> >>> > Matthias
> >> >>> >
> >> >>> >
> >> >>> > On Sat, Apr 21, 2018 at 6:28 PM, Qifan Pu <qifan...@gmail.com>
> >> wrote:
> >> >>> >> Hi Matthias,
> >> >>> >>
> >> >>> >> Thanks for the fast response and detailed information. This is
> >> really
> >> >>> >> helpful.
> >> >>> >>
> >> >>> >> I just tried to run it, and was tracing down a indexing bug that
> >> can
> >> >>> >> be
> >> >>> >> repeated by simply running the test script of triangle solve[1]
> >> >>> >> Caused by: org.apache.sysml.runtime.DMLRuntimeException: Invalid
> >> >>> >> values
> >> >>> >> for
> >> >>> >> matrix indexing: [1667:3333,1:1666] must be within matrix
> >> dimensions
> >> >>> >> [1000,1000]
> >> >>> >>
> >> >>> >>
> >> >>> >> Am I missing some configuration here?
> >> >>> >>
> >> >>> >>
> >> >>> >> [1]
> >> >>> >>
> >> >>> >>
> >> >>> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.
> >> com_apache_systemml_blob_master_scripts_staging_scalable-
> >> 5Flinalg_test_test-5Ftriangular-5Finv.dml&d=DwIBaQ&c=jf_
> >> iaSHvJObTbx-siA1ZOg&r=3mYOfURw_FSirAnoSv2pWvLSi1psso
> >> 4F9RdGjEWL6yc&m=FvqDr_AKzY5EAD_GAXIJoot0Z09NtMUt8kLS
> >> hXcJxqQ&s=zIEgt74yeZzCTqvLCgV_0J8ECApG541uUlbaGMcK8bs&e=
> >> >>> >>
> >> >>> >>
> >> >>> >> Best,
> >> >>> >> Qifan
> >> >>> >>
> >> >>> >>
> >> >>> >> On Sat, Apr 21, 2018 at 4:06 PM, Matthias Boehm <
> mboe...@gmail.com
> >> >
> >> >>> >> wrote:
> >> >>> >>>
> >> >>> >>> Hi Qifan,
> >> >>> >>>
> >> >>> >>> thanks for your feedback. You're right, the builtin functions
> >> >>> >>> cholesky, inverse, eigen, solve, svd, qr, and lu are currently
> >> only
> >> >>> >>> supported as single-node operations because they're still
> >> implemented
> >> >>> >>> via Apache commons.math.
> >> >>> >>>
> >> >>> >>> However, there is an experimental script for distributed
> cholesky
> >> [1]
> >> >>> >>> which uses a recursive approach (with operations that allow for
> >> >>> >>> automatic distributed computation) for matrices larger than a
> >> >>> >>> user-defined block size. Once blocks become small enough, we use
> >> >>> >>> again
> >> >>> >>> the builtin cholesky. Graduating this script would require a
> >> broader
> >> >>> >>> set of experiments (and potential improvements) but it simply
> did
> >> not
> >> >>> >>> have the highest priority so far. You might want to give it a
> try
> >> >>> >>> though.
> >> >>> >>>
> >> >>> >>> Thanks again for your feedback - we'll consider a higher
> priority
> >> for
> >> >>> >>> these distributed operations when discussing the roadmap for the
> >> next
> >> >>> >>> releases.
> >> >>> >>>
> >> >>> >>> [1]
> >> >>> >>>
> >> >>> >>>
> >> >>> >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.
> >> com_apache_systemml_blob_master_scripts_staging_scalable-
> >> 5Flinalg_cholesky.dml&d=DwIBaQ&c=jf_iaSHvJObTbx-siA1ZO
> >> g&r=3mYOfURw_FSirAnoSv2pWvLSi1psso4F9RdGjEWL6yc&m=FvqDr_
> >> AKzY5EAD_GAXIJoot0Z09NtMUt8kLShXcJxqQ&s=Yrj4GGcTlpZGRw34RoON_oO6-xDUti
> >> IEUcO7-qIOyoc&e=
> >> >>> >>>
> >> >>> >>> Regards,
> >> >>> >>> Matthias
> >> >>> >>>
> >> >>> >>> On Sat, Apr 21, 2018 at 2:15 PM, Qifan Pu <qifan...@gmail.com>
> >> wrote:
> >> >>> >>> > Hi,
> >> >>> >>> >
> >> >>> >>> > I would love to do distributed cholesky on large matrix with
> >> >>> >>> > SystemML. I
> >> >>> >>> > found two related jiras (SYSTEMML-1213, SYSTEMML-1163), but
> >> AFAIK,
> >> >>> >>> > this
> >> >>> >>> > is
> >> >>> >>> > currently not implemented? I just wanted to check.
> >> >>> >>> >
> >> >>> >>> > Best,
> >> >>> >>> > Qifan
> >> >>> >>
> >> >>> >>
> >> >>
> >> >>
> >> >
> >> >
> >> >
> >> >
> >>
> >>
> >>
> >>
> >>
> >>
> >
>

Re: distributed cholesky on systemml

Reply via email to