from:"Anthony Thomas"

Re: [VOTE] Apache SystemML 1.2.0 (RC1)

2018-08-21 Thread Anthony Thomas

+1

I ran the Python test suite on Red Hat Linux under Spark 2.2.0 (Python
2.7.5) and encountered no errors.

Regards,
Anthony

On Tue, Aug 21, 2018 at 7:49 AM Guobao Li  wrote:

> +1
>
> As an initiator and user of paramserv func, I just launched several tests
> on local pc with a script using paramserv func without mkl. And no bug is
> observed.
>
> Regards,
> Guobao
>
> On Sun, Aug 19, 2018 at 8:09 PM Matthias Boehm  wrote:
>
> > +1
> >
> > I ran the perftest suite multiple times up to 80GB with and without
> > codegen. After fixing all the issues and regressions, the entire suite
> > ran successfully against Spark 2.2 and 2.3 and all use cases showed
> > equal or better performance compared to SystemML 1.1.
> >
> > Regards,
> > Matthias
> >
> > On Fri, Aug 17, 2018 at 8:41 AM, Berthold Reinwald 
> > wrote:
> > > Please vote on releasing the following candidate as Apache SystemML
> > > version 1.2.0
> > >
> > > The vote is open for at least 72 hours and passes if a majority of at
> > > least 3 +1 PMC votes are cast.
> > >
> > > [ ] +1 Release this package as Apache SystemML 1.2.0
> > > [ ] -1 Do not release this package because ...
> > >
> > > To learn more about Apache SystemML, please see
> > > http://systemml.apache.org/
> > >
> > >
> > > The tag to be voted on is v1.2.0-rc1 (
> > > a1a05e29f6ee78f3c33fea355f62c78ce21766ee):
> > > https://github.com/apache/systemml/tree/v1.2.0-rc1
> > >
> > >
> > > The release artifacts can be found at:
> > > https://dist.apache.org/repos/dist/dev/systemml/1.2.0-rc1/
> > >
> > >
> > > The maven release artifacts, including signatures, digests, etc. can be
> > > found at:
> > >
> >
> https://repository.apache.org/content/repositories/orgapachesystemml-1030/org/apache/systemml/systemml/1.2.0/
> > >
> > >
> > >
> > > ===
> > > == Apache Release policy ==
> > > ===
> > > http://www.apache.org/legal/release-policy.html
> > >
> > >
> > > ===
> > > == How can I help test this release? ==
> > > ===
> > > If you are a SystemML user, you can help us test this release by taking
> > an
> > >
> > >
> > > existing Algorithm or workload and running on this release candidate,
> > then
> > >
> > >
> > > reporting any regressions.
> > >
> > > 
> > > == What justifies a -1 vote for this release? ==
> > > 
> > > -1 votes should only occur for significant stop-ship bugs or legal
> > related
> > >
> > >
> > > issues (e.g. wrong license, missing header files, etc). Minor bugs or
> > > regressions should not block this release.
> > >
> > >
> > >
> > > Regards,
> > > Berthold Reinwald
> > > IBM Almaden Research Center
> > > office: (408) 927 2208; T/L: 457 2208
> > > e-mail: reinw...@us.ibm.com
> > >
> >
>

Re: Passing a CoordinateMatrix to SystemML

2017-12-24 Thread Anthony Thomas

4, 2017 at 3:14 AM, Matthias Boehm <mboe...@gmail.com> wrote:

> Thanks again for catching this issue Anthony - this IJV reblock issue with
> large ultra-sparse matrices is now fixed in master. It likely did not show
> up on the 1% sample because the data was small enough to read it directly
> into the driver.
>
> However, the dataFrameToBinaryBlock might be another issue that I could
> not reproduce yet, so it would be very helpful if you could give it another
> try. Thanks.
>
> Regards,
> Matthias
>
>
> On 12/24/2017 9:57 AM, Matthias Boehm wrote:
>
>> Hi Anthony,
>>
>> thanks for helping to debug this issue. There are no limits other than
>> the dimensions and number of non-zeros being of type long. It sounds
>> more like an issues of converting special cases of ultra-sparse
>> matrices. I'll try to reproduce this issue and give an update as soon as
>> I know more. In the meantime, could you please (a) also provide the
>> stacktrace of calling dataFrameToBinaryBlock with SystemML 1.0, and (b)
>> try calling your IJV conversion script via spark submit to exclude that
>> this issue is API-related? Thanks.
>>
>> Regards,
>> Matthias
>>
>> On 12/24/2017 1:40 AM, Anthony Thomas wrote:
>>
>>> Okay thanks for the suggestions - I upgraded to 1.0 and tried providing
>>> dimensions and blocksizes to dataFrameToBinaryBlock both without
>>> success. I
>>> additionally wrote out the matrix to hdfs in IJV format and am still
>>> getting the same error when calling "read()" directly in the DML.
>>> However,
>>> I created a 1% sample of the original data in IJV format and SystemML was
>>> able to read the smaller file without any issue. This would seem to
>>> suggest
>>> that either there is some corruption in the full file or I'm running into
>>> some limit. The matrix is on the larger side: 1.9e8 rows by 7e4 cols with
>>> 2.4e9 nonzero values, but this seems like it should be well within the
>>> limits of what SystemML/Spark can handle. I also checked for obvious data
>>> errors (file is not 1 indexed or contains blank lines). In case it's
>>> helpful, the stacktrace from reading the data from hdfs in IJV format is
>>> attached. Thanks again for your help - I really appreciate it.
>>>
>>>  00:24:18 WARN TaskSetManager: Lost task 30.0 in stage 0.0 (TID 126,
>>> 10.11.10.13, executor 0): java.lang.ArrayIndexOutOfBoundsException
>>> at java.lang.System.arraycopy(Native Method)
>>> at
>>> org.apache.sysml.runtime.matrix.data.SparseBlockCOO.shiftRig
>>> htByN(SparseBlockCOO.java:594)
>>>
>>> at
>>> org.apache.sysml.runtime.matrix.data.SparseBlockCOO.set(
>>> SparseBlockCOO.java:323)
>>>
>>> at
>>> org.apache.sysml.runtime.matrix.data.MatrixBlock.mergeIntoSp
>>> arse(MatrixBlock.java:1790)
>>>
>>> at
>>> org.apache.sysml.runtime.matrix.data.MatrixBlock.merge(Matri
>>> xBlock.java:1736)
>>>
>>> at
>>> org.apache.sysml.runtime.instructions.spark.utils.RDDAggrega
>>> teUtils$MergeBlocksFunction.call(RDDAggregateUtils.java:627)
>>>
>>> at
>>> org.apache.sysml.runtime.instructions.spark.utils.RDDAggrega
>>> teUtils$MergeBlocksFunction.call(RDDAggregateUtils.java:596)
>>>
>>> at
>>> org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFuncti
>>> on2$1.apply(JavaPairRDD.scala:1037)
>>>
>>> at
>>> org.apache.spark.util.collection.ExternalSorter$$anonfun$5.
>>> apply(ExternalSorter.scala:189)
>>>
>>> at
>>> org.apache.spark.util.collection.ExternalSorter$$anonfun$5.
>>> apply(ExternalSorter.scala:188)
>>>
>>> at
>>> org.apache.spark.util.collection.AppendOnlyMap.changeValue(
>>> AppendOnlyMap.scala:150)
>>>
>>> at
>>> org.apache.spark.util.collection.SizeTrackingAppendOnlyMap.c
>>> hangeValue(SizeTrackingAppendOnlyMap.scala:32)
>>>
>>> at
>>> org.apache.spark.util.collection.ExternalSorter.insertAll(
>>> ExternalSorter.scala:194)
>>>
>>> at
>>> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortSh
>>> uffleWriter.scala:63)
>>>
>>> at
>>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMap
>>> Task.scala:96)
>>>
>>> at
>>> org.apache.spark.scheduler

Re: Passing a CoordinateMatrix to SystemML

2017-12-23 Thread Anthony Thomas

Okay thanks for the suggestions - I upgraded to 1.0 and tried providing
dimensions and blocksizes to dataFrameToBinaryBlock both without success. I
additionally wrote out the matrix to hdfs in IJV format and am still
getting the same error when calling "read()" directly in the DML. However,
I created a 1% sample of the original data in IJV format and SystemML was
able to read the smaller file without any issue. This would seem to suggest
that either there is some corruption in the full file or I'm running into
some limit. The matrix is on the larger side: 1.9e8 rows by 7e4 cols with
2.4e9 nonzero values, but this seems like it should be well within the
limits of what SystemML/Spark can handle. I also checked for obvious data
errors (file is not 1 indexed or contains blank lines). In case it's
helpful, the stacktrace from reading the data from hdfs in IJV format is
attached. Thanks again for your help - I really appreciate it.

 00:24:18 WARN TaskSetManager: Lost task 30.0 in stage 0.0 (TID 126,
10.11.10.13, executor 0): java.lang.ArrayIndexOutOfBoundsException
at java.lang.System.arraycopy(Native Method)
at
org.apache.sysml.runtime.matrix.data.SparseBlockCOO.shiftRightByN(SparseBlockCOO.java:594)
at
org.apache.sysml.runtime.matrix.data.SparseBlockCOO.set(SparseBlockCOO.java:323)
at
org.apache.sysml.runtime.matrix.data.MatrixBlock.mergeIntoSparse(MatrixBlock.java:1790)
at
org.apache.sysml.runtime.matrix.data.MatrixBlock.merge(MatrixBlock.java:1736)
at
org.apache.sysml.runtime.instructions.spark.utils.RDDAggregateUtils$MergeBlocksFunction.call(RDDAggregateUtils.java:627)
at
org.apache.sysml.runtime.instructions.spark.utils.RDDAggregateUtils$MergeBlocksFunction.call(RDDAggregateUtils.java:596)
at
org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction2$1.apply(JavaPairRDD.scala:1037)
at
org.apache.spark.util.collection.ExternalSorter$$anonfun$5.apply(ExternalSorter.scala:189)
at
org.apache.spark.util.collection.ExternalSorter$$anonfun$5.apply(ExternalSorter.scala:188)
at
org.apache.spark.util.collection.AppendOnlyMap.changeValue(AppendOnlyMap.scala:150)
at
org.apache.spark.util.collection.SizeTrackingAppendOnlyMap.changeValue(SizeTrackingAppendOnlyMap.scala:32)
at
org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:194)
at
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Anthony


On Sat, Dec 23, 2017 at 4:27 AM, Matthias Boehm <mboe...@gmail.com> wrote:

> Given the line numbers from the stacktrace, it seems that you use a rather
> old version of SystemML. Hence, I would recommend to upgrade to SystemML
> 1.0 or at least 0.15 first.
>
> If the error persists or you're not able to upgrade, please try to call
> dataFrameToBinaryBlock with provided matrix characteristics of dimensions
> and blocksizes. The issue you've shown usually originates from incorrect
> meta data (e.g., negative number of columns or block sizes), which prevents
> the sparse rows from growing to the necessary sizes.
>
> Regards,
> Matthias
>
> On 12/22/2017 10:42 PM, Anthony Thomas wrote:
>
>> Hi Matthias,
>>
>> Thanks for the help! In response to your questions:
>>
>>1. Sorry - this was a typo: the correct schema is: [y: int, features:
>>vector] - the column "features" was created using Spark's
>> VectorAssembler
>>and the underlying type is an org.apache.spark.ml.linalg.SparseVector.
>>Calling x.schema results in: org.apache.spark.sql.types.StructType =
>>StructType(StructField(features,org.apache.spark.ml.
>>linalg.VectorUDT@3bfc3ba7,true)
>>2. "y" converts fine - it appears the only issue is with X. The script
>>still crashes when running "print(sum(X))". The full stack trace is
>>attached at the end of the message.
>>3. Unfortunately, the error persists when calling
>>RDDConverterUtils.dataFrameToBinaryBlock directly.
>>4. Also just in case this matters: I'm packaging the script into a jar
>>
>>using SBT assembly and submitting via spark-submit.
>>
>> Here's an updated script:
>>
>> val input_df = spark.read.parquet(inputPath)
>>

Re: Passing a CoordinateMatrix to SystemML

2017-12-22 Thread Anthony Thomas

mlcontext.MLContextException: Exception
occurred while executing runtime program
at org.apache.sysml.api.mlcontext.ScriptExecutor.executeRuntimeProgram(
ScriptExecutor.java:390)
at org.apache.sysml.api.mlcontext.ScriptExecutor.
execute(ScriptExecutor.java:298)
at org.apache.sysml.api.mlcontext.MLContext.execute(MLContext.java:303)
... 14 more
Caused by: org.apache.sysml.runtime.DMLRuntimeException:
org.apache.sysml.runtime.DMLRuntimeException: ERROR: Runtime error in
program block generated from statement block between lines 1 and 2 -- Error
evaluating instruction: CP°uak+°X·MATRIX·DOUBLE°_Var0·SCALAR·STRING°24
at org.apache.sysml.runtime.controlprogram.Program.
execute(Program.java:130)
at org.apache.sysml.api.mlcontext.ScriptExecutor.executeRuntimeProgram(
ScriptExecutor.java:388)
... 16 more
...



On Fri, Dec 22, 2017 at 5:48 AM, Matthias Boehm <mboe...@gmail.com> wrote:

> well, let's do the following to figure this out:
>
> 1) If the schema is indeed [label: Integer, features: SparseVector],
> please change the third line to val y = input_data.select("label").
>
> 2) For debugging, I would recommend to use a simple script like
> "print(sum(X));" and try converting X and y separately to isolate the
> problem.
>
> 3) If it's still failing, it would be helpful to known (a) if it's an
> issue of converting X, y, or both, as well as (b) the full stacktrace.
>
> 4) As a workaround you might also call our internal converter directly via:
> RDDConverterUtils.dataFrameToBinaryBlock(jsc, df, mc, containsID,
> isVector),
> where jsc is the java spark context, df is the dataset, mc are matrix
> characteristics (if unknown, simply use new MatrixCharacteristics()),
> containsID indicates if the dataset contains a column "__INDEX" with the
> row indexes, and isVector indicates if the passed datasets contains vectors
> or basic types such as double.
>
>
> Regards,
> Matthias
>
>
> On 12/22/2017 12:00 AM, Anthony Thomas wrote:
>
>> Hi SystemML folks,
>>
>> I'm trying to pass some data from Spark to a DML script via the MLContext
>> API. The data is derived from a parquet file containing a dataframe with
>> the schema: [label: Integer, features: SparseVector]. I am doing the
>> following:
>>
>> val input_data = spark.read.parquet(inputPath)
>> val x = input_data.select("features")
>> val y = input_data.select("y")
>> val x_meta = new MatrixMetadata(DF_VECTOR)
>> val y_meta = new MatrixMetadata(DF_DOUBLES)
>> val script = dmlFromFile(s"${script_path}/script.dml").
>> in("X", x, x_meta).
>> in("Y", y, y_meta)
>> ...
>>
>> However, this results in an error from SystemML:
>> java.lang.ArrayIndexOutOfBoundsException: 0
>> I'm guessing this has something to do with SparkML being zero indexed and
>> SystemML being 1 indexed. Is there something I should be doing differently
>> here? Note that I also tried converting the dataframe to a
>> CoordinateMatrix
>> and then creating an RDD[String] in IJV format. That too resulted in
>> "ArrayIndexOutOfBoundsExceptions." I'm guessing there's something simple
>> I'm doing wrong here, but I haven't been able to figure out exactly what.
>> Please let me know if you need more information (I can send along the full
>> error stacktrace if that would be helpful)!
>>
>> Thanks,
>>
>> Anthony
>>
>>

Passing a CoordinateMatrix to SystemML

2017-12-21 Thread Anthony Thomas

Hi SystemML folks,

I'm trying to pass some data from Spark to a DML script via the MLContext
API. The data is derived from a parquet file containing a dataframe with
the schema: [label: Integer, features: SparseVector]. I am doing the
following:

val input_data = spark.read.parquet(inputPath)
val x = input_data.select("features")
val y = input_data.select("y")
val x_meta = new MatrixMetadata(DF_VECTOR)
val y_meta = new MatrixMetadata(DF_DOUBLES)
val script = dmlFromFile(s"${script_path}/script.dml").
in("X", x, x_meta).
in("Y", y, y_meta)
...

However, this results in an error from SystemML:
java.lang.ArrayIndexOutOfBoundsException: 0
I'm guessing this has something to do with SparkML being zero indexed and
SystemML being 1 indexed. Is there something I should be doing differently
here? Note that I also tried converting the dataframe to a CoordinateMatrix
and then creating an RDD[String] in IJV format. That too resulted in
"ArrayIndexOutOfBoundsExceptions." I'm guessing there's something simple
I'm doing wrong here, but I haven't been able to figure out exactly what.
Please let me know if you need more information (I can send along the full
error stacktrace if that would be helpful)!

Thanks,

Anthony

SystemML on a single node

2017-07-16 Thread Anthony Thomas

Hi SystemML folks,

Are there any recommended Spark configurations when running SystemML on a
single machine? I.e. is there a difference between launching Spark with
master=local[*] and running SystemML as a standard process in the JVM as
opposed to launching a single node spark cluster? If the latter, is there a
recommended balance between driver and executor memory?

Thanks,

Anthony

Unexpected Executor Crash

2017-06-15 Thread Anthony Thomas

Hi SystemML Developers,

I'm running the following simple DML script under SystemML 0.14:

M = read('/scratch/M5.csv')
N = read('/scratch/M5.csv')
MN = M %*% N
if (1 == 1) {
print(as.scalar(MN[1,1]))
}

The matrix M is square and about 5GB on disk (stored in HDFS). I am
submitting the script to a 2 node spark cluster where each physical machine
has 30GB of RAM. I am using the following command to submit the job:

$SPARK_HOME/bin/spark-submit --driver-memory=5G --executor-memory=25G
--conf spark.driver.maxResultSize=0 --conf spark.akka.frameSize=128
--verbose --conf
spark.serializer=org.apache.spark.serializer.KryoSerializer
$SYSTEMML_HOME/SystemML.jar -f example.dml -exec spark -explain runtime

However, I consistently run into errors like:

ERROR TaskSchedulerImpl: Lost executor 1 on 172.31.3.116: Remote RPC client
disassociated. Likely due to containers exceeding thresholds, or network
issues. Check driver logs for WARN messages.

and the job eventually aborts. Consulting the output of executors shows
they are crashing with OutOfMemory exceptions. Even if one executor needed
to store M,N and MN in memory simultaneously it seems like there should be
enough memory so I'm unsure why the executor is crashing. In addition, I
was under the impression that Spark would spill to disk if there was
insufficient memory. I've tried various combinations of
increasing/decreasing the number of executor cores (from 1 to 8), using
more/fewer executors, increasing/decreasing Spark's memoryFraction, and
increasing/decreasing Spark's default parallelism all without success. Can
anyone offer any advice or suggestions to debug this issue further? I'm not
a very experienced Spark user so it's very possible I haven't configured
something correctly. Please let me know if you'd like any further
information.

Best,

Anthony Thomas

Re: [VOTE] Apache SystemML 1.2.0 (RC1)

Re: Passing a CoordinateMatrix to SystemML

Re: Passing a CoordinateMatrix to SystemML

Re: Passing a CoordinateMatrix to SystemML

Passing a CoordinateMatrix to SystemML

SystemML on a single node

Unexpected Executor Crash

7 matches

Site Navigation

Mail list logo

Footer information