[jira] [Assigned] (SYSTEMML-399) Improvements multi-threaded read/write (all formats)
[ https://issues.apache.org/jira/browse/SYSTEMML-399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Boehm reassigned SYSTEMML-399: --- Assignee: Matthias Boehm > Improvements multi-threaded read/write (all formats) > > > Key: SYSTEMML-399 > URL: https://issues.apache.org/jira/browse/SYSTEMML-399 > Project: SystemML > Issue Type: Task > Components: Runtime >Reporter: Matthias Boehm >Assignee: Matthias Boehm > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (SYSTEMML-396) Multi-threaded cumulative aggregates
[ https://issues.apache.org/jira/browse/SYSTEMML-396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Boehm resolved SYSTEMML-396. - Resolution: Done Fix Version/s: SystemML 0.10 > Multi-threaded cumulative aggregates > > > Key: SYSTEMML-396 > URL: https://issues.apache.org/jira/browse/SYSTEMML-396 > Project: SystemML > Issue Type: Task > Components: Compiler, Runtime >Reporter: Matthias Boehm >Assignee: Matthias Boehm > Fix For: SystemML 0.10 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (SYSTEMML-396) Multi-threaded cumulative aggregates
[ https://issues.apache.org/jira/browse/SYSTEMML-396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Boehm reassigned SYSTEMML-396: --- Assignee: Matthias Boehm > Multi-threaded cumulative aggregates > > > Key: SYSTEMML-396 > URL: https://issues.apache.org/jira/browse/SYSTEMML-396 > Project: SystemML > Issue Type: Task > Components: Compiler, Runtime >Reporter: Matthias Boehm >Assignee: Matthias Boehm > Fix For: SystemML 0.10 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (SYSTEMML-560) Distributed frame representation
[ https://issues.apache.org/jira/browse/SYSTEMML-560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Boehm updated SYSTEMML-560: Description: The major design goals for our distributed binary block frame representation are twofold: * Seamless integration: First, we aim for a seamless integration with (1) Spark's DataFrame and DataSet representations, (2) csv text formats, and (3) SystemML's binary block matrix representations. * Memory efficiency: Second, we are still interested in a block representation to exploit the column-wise native array storage of FrameBlocks. As a good compromise with regard to both design goals, the initial design proposal is {code} FRAME := JavaPairRDD {code} where the keys represents the row offsets of frameblock values, a frameblock value covers one or multiple rows and all columns of the frame, and most importantly frameblock values do not exhibit a fixed block size. NOTE that in comparison to Spark's data frames, SystemML's frames are row-indexed (no a set of rows) in order to allow well-defined indexing operations over frames (as possible in R). This representation would allow a shuffle-free conversion from DataFrames, DataSets, CSV to SystemML's Frames and vice versa while still exploiting a block structure whenever possible (moderate numbers of columns). Similar, binary block matrix to frame conversions can also be done without shuffle in the common case ncol <= blocksize (default 1k). Finally, this representation also seems to be advantageous with regard to the common frame operations of transform, transform apply, indexing, append, and transform decode. was: The major design goals for our distributed binary block frame representation are twofold: * Seamless integration: First, we aim for a seamless integration with (1) Spark's DataFrame and DataSet representations, (2) csv text formats, and (3) SystemML's binary block matrix representations. * Memory efficiency: Second, we are still interested in a block representation to exploit the column-wise native array storage of FrameBlocks. As a good compromise with regard to both design goals, the initial design proposal is {code} FRAME := JavaPairRDD {code} where the keys represents the row offsets of frameblock values, a frameblock value covers one or multiple rows and all columns of the frame, and most importantly frameblock values do not exhibit a fixed block size. This representation would allow a shuffle-free conversion from DataFrames, DataSets, CSV to SystemML's Frames and vice versa while still exploiting a block structure whenever possible (moderate numbers of columns). Similar, binary block matrix to frame conversions can also be done without shuffle in the common case ncol <= blocksize (default 1k). Finally, this representation also seems to be advantageous with regard to the common frame operations of transform, transform apply, indexing, append, and transform decode. > Distributed frame representation > > > Key: SYSTEMML-560 > URL: https://issues.apache.org/jira/browse/SYSTEMML-560 > Project: SystemML > Issue Type: Task >Reporter: Matthias Boehm >Assignee: Arvind Surve > > The major design goals for our distributed binary block frame representation > are twofold: > * Seamless integration: First, we aim for a seamless integration with (1) > Spark's DataFrame and DataSet representations, (2) csv text formats, and (3) > SystemML's binary block matrix representations. > * Memory efficiency: Second, we are still interested in a block > representation to exploit the column-wise native array storage of > FrameBlocks. > As a good compromise with regard to both design goals, the initial design > proposal is > {code} > FRAME := JavaPairRDD > {code} > where the keys represents the row offsets of frameblock values, a frameblock > value covers one or multiple rows and all columns of the frame, and most > importantly frameblock values do not exhibit a fixed block size. NOTE that in > comparison to Spark's data frames, SystemML's frames are row-indexed (no a > set of rows) in order to allow well-defined indexing operations over frames > (as possible in R). > This representation would allow a shuffle-free conversion from DataFrames, > DataSets, CSV to SystemML's Frames and vice versa while still exploiting a > block structure whenever possible (moderate numbers of columns). Similar, > binary block matrix to frame conversions can also be done without shuffle in > the common case ncol <= blocksize (default 1k). Finally, this representation > also seems to be advantageous with regard to the common frame operations of > transform, transform apply, indexing, append, and transform decode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (SYSTEMML-602) Investigation internal use of Dataset
Matthias Boehm created SYSTEMML-602: --- Summary: Investigation internal use of Dataset Key: SYSTEMML-602 URL: https://issues.apache.org/jira/browse/SYSTEMML-602 Project: SystemML Issue Type: Sub-task Reporter: Matthias Boehm -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (SYSTEMML-601) Data converters frame to/from binary block matrices
Matthias Boehm created SYSTEMML-601: --- Summary: Data converters frame to/from binary block matrices Key: SYSTEMML-601 URL: https://issues.apache.org/jira/browse/SYSTEMML-601 Project: SystemML Issue Type: Sub-task Reporter: Matthias Boehm -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (SYSTEMML-599) Data converters frame to/frame spark dataset/dataframe
Matthias Boehm created SYSTEMML-599: --- Summary: Data converters frame to/frame spark dataset/dataframe Key: SYSTEMML-599 URL: https://issues.apache.org/jira/browse/SYSTEMML-599 Project: SystemML Issue Type: Sub-task Reporter: Matthias Boehm -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (SYSTEMML-600) Data converters frame to/from csv text format
[ https://issues.apache.org/jira/browse/SYSTEMML-600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Boehm updated SYSTEMML-600: Summary: Data converters frame to/from csv text format (was: Data converters frame to/frome csv text format) > Data converters frame to/from csv text format > - > > Key: SYSTEMML-600 > URL: https://issues.apache.org/jira/browse/SYSTEMML-600 > Project: SystemML > Issue Type: Sub-task >Reporter: Matthias Boehm > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (SYSTEMML-599) Data converters frame to/from spark dataset/dataframe
[ https://issues.apache.org/jira/browse/SYSTEMML-599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Boehm updated SYSTEMML-599: Summary: Data converters frame to/from spark dataset/dataframe (was: Data converters frame to/frame spark dataset/dataframe) > Data converters frame to/from spark dataset/dataframe > - > > Key: SYSTEMML-599 > URL: https://issues.apache.org/jira/browse/SYSTEMML-599 > Project: SystemML > Issue Type: Sub-task >Reporter: Matthias Boehm > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (SYSTEMML-600) Data converters frame to/frome csv text format
Matthias Boehm created SYSTEMML-600: --- Summary: Data converters frame to/frome csv text format Key: SYSTEMML-600 URL: https://issues.apache.org/jira/browse/SYSTEMML-600 Project: SystemML Issue Type: Sub-task Reporter: Matthias Boehm -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (SYSTEMML-598) Modifications binary frame CP readers/writers
Matthias Boehm created SYSTEMML-598: --- Summary: Modifications binary frame CP readers/writers Key: SYSTEMML-598 URL: https://issues.apache.org/jira/browse/SYSTEMML-598 Project: SystemML Issue Type: Sub-task Reporter: Matthias Boehm -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (SYSTEMML-560) Distributed frame representation
[ https://issues.apache.org/jira/browse/SYSTEMML-560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Boehm updated SYSTEMML-560: Description: The major design goals for our distributed binary block frame representation are twofold: * Seamless integration: First, we aim for a seamless integration with (1) Spark's DataFrame and DataSet representations, (2) csv text formats, and (3) SystemML's binary block matrix representations. * Memory efficiency: Second, we are still interested in a block representation to exploit the column-wise native array storage of FrameBlocks. As a good compromise with regard to both design goals, the initial design proposal is {code} FRAME := JavaPairRDD {code} where the keys represents the row offsets of frameblock values, a frameblock value covers one or multiple rows and all columns of the frame, and most importantly frameblock values do not exhibit a fixed block size. This representation would allow a shuffle-free conversion from DataFrames, DataSets, CSV to SystemML's Frames and vice versa while still exploiting a block structure whenever possible (moderate numbers of columns). Similar, binary block matrix to frame conversions can also be done without shuffle in the common case ncol <= blocksize (default 1k). Finally, this representation also seems to be advantageous with regard to the common frame operations of transform, transform apply, indexing, append, and transform decode. was: The major design goals for our distributed binary block frame representation are twofold: * Seamless integration: First, we aim for a seamless integration with (1) Spark's DataFrame and DataSet representations, (2) csv text formats, and (3) SystemML's binary block matrix representations. * Memory efficiency: Second, we are still interested in a block representation to exploit the column-wise native array storage of FrameBlocks. As a good compromise with regard to both design goals, the initial design proposal is {code} FRAME := JavaPairRDD {code} where the keys represents the row offsets of frameblock values, a frameblock value covers one or multiple rows and all columns of the frame, and most importantly frameblock values do not exhibit a fixed block size. > Distributed frame representation > > > Key: SYSTEMML-560 > URL: https://issues.apache.org/jira/browse/SYSTEMML-560 > Project: SystemML > Issue Type: Task >Reporter: Matthias Boehm >Assignee: Arvind Surve > > The major design goals for our distributed binary block frame representation > are twofold: > * Seamless integration: First, we aim for a seamless integration with (1) > Spark's DataFrame and DataSet representations, (2) csv text formats, and (3) > SystemML's binary block matrix representations. > * Memory efficiency: Second, we are still interested in a block > representation to exploit the column-wise native array storage of > FrameBlocks. > As a good compromise with regard to both design goals, the initial design > proposal is > {code} > FRAME := JavaPairRDD > {code} > where the keys represents the row offsets of frameblock values, a frameblock > value covers one or multiple rows and all columns of the frame, and most > importantly frameblock values do not exhibit a fixed block size. > This representation would allow a shuffle-free conversion from DataFrames, > DataSets, CSV to SystemML's Frames and vice versa while still exploiting a > block structure whenever possible (moderate numbers of columns). Similar, > binary block matrix to frame conversions can also be done without shuffle in > the common case ncol <= blocksize (default 1k). Finally, this representation > also seems to be advantageous with regard to the common frame operations of > transform, transform apply, indexing, append, and transform decode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (SYSTEMML-560) Distributed frame representation
[ https://issues.apache.org/jira/browse/SYSTEMML-560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Boehm updated SYSTEMML-560: Assignee: Arvind Surve > Distributed frame representation > > > Key: SYSTEMML-560 > URL: https://issues.apache.org/jira/browse/SYSTEMML-560 > Project: SystemML > Issue Type: Task >Reporter: Matthias Boehm >Assignee: Arvind Surve > > The major design goals for our distributed binary block frame representation > are twofold: > * Seamless integration: First, we aim for a seamless integration with (1) > Spark's DataFrame and DataSet representations, (2) csv text formats, and (3) > SystemML's binary block matrix representations. > * Memory efficiency: Second, we are still interested in a block > representation to exploit the column-wise native array storage of > FrameBlocks. > As a good compromise with regard to both design goals, the initial design > proposal is > {code} > FRAME := JavaPairRDD > {code} > where the keys represents the row offsets of frameblock values, a frameblock > value covers one or multiple rows and all columns of the frame, and most > importantly frameblock values do not exhibit a fixed block size. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (SYSTEMML-560) Distributed frame representation
[ https://issues.apache.org/jira/browse/SYSTEMML-560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Boehm updated SYSTEMML-560: Description: The major design goals for our distributed binary block frame representation are twofold: * Seamless integration: First, we aim for a seamless integration with (1) Spark's DataFrame and DataSet representations, (2) csv text formats, and (3) SystemML's binary block matrix representations. * Memory efficiency: Second, we are still interested in a block representation to exploit the column-wise native array storage of FrameBlocks. As a good compromise with regard to both design goals, the initial design proposal is {code} FRAME := JavaPairRDD {code} where the keys represents the row offsets of frameblock values, a frameblock value covers one or multiple rows and all columns of the frame, and most importantly frameblock values do not exhibit a fixed block size. was: The major design goals for our distributed binary block frame representation are twofold: * Seamless integration: First, we aim for a seamless integration with (1) Spark's DataFrame and DataSet representations, (2) csv text formats, and (3) SystemML's binary block matrix representations. * Memory efficiency: Second, we are still interested in a block representation to exploit the column-wise native array storage of FrameBlocks. As a good compromise with regard to both design goals, the initial design proposal is {code} FRAME := JavaPairRDD {code} where the keys represents the row offsets of frameblock values, a frameblock value covers one or multiple rows and all columns of the frame, and most importantly frameblocks values do not exhibit a fixed block size. > Distributed frame representation > > > Key: SYSTEMML-560 > URL: https://issues.apache.org/jira/browse/SYSTEMML-560 > Project: SystemML > Issue Type: Task >Reporter: Matthias Boehm > > The major design goals for our distributed binary block frame representation > are twofold: > * Seamless integration: First, we aim for a seamless integration with (1) > Spark's DataFrame and DataSet representations, (2) csv text formats, and (3) > SystemML's binary block matrix representations. > * Memory efficiency: Second, we are still interested in a block > representation to exploit the column-wise native array storage of > FrameBlocks. > As a good compromise with regard to both design goals, the initial design > proposal is > {code} > FRAME := JavaPairRDD > {code} > where the keys represents the row offsets of frameblock values, a frameblock > value covers one or multiple rows and all columns of the frame, and most > importantly frameblock values do not exhibit a fixed block size. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (SYSTEMML-560) Distributed frame representation
[ https://issues.apache.org/jira/browse/SYSTEMML-560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Boehm updated SYSTEMML-560: Description: The major design goals for our distributed binary block frame representation are twofold: * Seamless integration: First, we aim for a seamless integration with (1) Spark's DataFrame and DataSet representations, (2) csv text formats, and (3) SystemML's binary block matrix representations. * Memory efficiency: Second, we are still interested in a block representation to exploit the column-wise native array storage of FrameBlocks. As a good compromise with regard to both design goals, the initial design proposal is {code} FRAME := JavaPairRDD {code} where the keys represents the row offsets of frameblock values, a frameblock value covers one or multiple rows and all columns of the frame, and most importantly frameblocks values do not exhibit a fixed block size. > Distributed frame representation > > > Key: SYSTEMML-560 > URL: https://issues.apache.org/jira/browse/SYSTEMML-560 > Project: SystemML > Issue Type: Task >Reporter: Matthias Boehm > > The major design goals for our distributed binary block frame representation > are twofold: > * Seamless integration: First, we aim for a seamless integration with (1) > Spark's DataFrame and DataSet representations, (2) csv text formats, and (3) > SystemML's binary block matrix representations. > * Memory efficiency: Second, we are still interested in a block > representation to exploit the column-wise native array storage of > FrameBlocks. > As a good compromise with regard to both design goals, the initial design > proposal is > {code} > FRAME := JavaPairRDD > {code} > where the keys represents the row offsets of frameblock values, a frameblock > value covers one or multiple rows and all columns of the frame, and most > importantly frameblocks values do not exhibit a fixed block size. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (SYSTEMML-570) Consolidation buffer pool matrices and frames
[ https://issues.apache.org/jira/browse/SYSTEMML-570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Boehm reassigned SYSTEMML-570: --- Assignee: Matthias Boehm > Consolidation buffer pool matrices and frames > - > > Key: SYSTEMML-570 > URL: https://issues.apache.org/jira/browse/SYSTEMML-570 > Project: SystemML > Issue Type: Task > Components: Runtime >Reporter: Matthias Boehm >Assignee: Matthias Boehm > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (SYSTEMML-533) Error when print consecutive pound signs
[ https://issues.apache.org/jira/browse/SYSTEMML-533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias Boehm resolved SYSTEMML-533. - Resolution: Fixed Fix Version/s: SystemML 0.10 > Error when print consecutive pound signs > > > Key: SYSTEMML-533 > URL: https://issues.apache.org/jira/browse/SYSTEMML-533 > Project: SystemML > Issue Type: Bug > Components: Parser >Reporter: Deron Eriksson >Assignee: Matthias Boehm >Priority: Minor > Fix For: SystemML 0.10 > > > Trying to print consecutive pound (#) signs results in an error: > {code} > print("#"); # this works > print("##"); # this doesn't work > print("###"); # this doesn't work > print("#"+"#"); # this doesn't work > a="##"; > print(a); # this doesn't work > b="#"; > print(b+b); # this doesn't work > {code} > Error: > {code} > Caused by: java.lang.StringIndexOutOfBoundsException: String index out of > range: -12 > at java.lang.String.substring(String.java:1937) > at > org.apache.sysml.lops.runtime.RunMRJobs.updateInstLabels(RunMRJobs.java:449) > at > org.apache.sysml.lops.runtime.RunMRJobs.updateLabels(RunMRJobs.java:478) > at > org.apache.sysml.runtime.instructions.cp.CPInstruction.preprocessInstruction(CPInstruction.java:81) > at > org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:306) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)