[jira] [Assigned] (SYSTEMML-399) Improvements multi-threaded read/write (all formats)

2016-03-27 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm reassigned SYSTEMML-399:
---

Assignee: Matthias Boehm

> Improvements multi-threaded read/write (all formats)
> 
>
> Key: SYSTEMML-399
> URL: https://issues.apache.org/jira/browse/SYSTEMML-399
> Project: SystemML
>  Issue Type: Task
>  Components: Runtime
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (SYSTEMML-396) Multi-threaded cumulative aggregates

2016-03-27 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-396.
-
   Resolution: Done
Fix Version/s: SystemML 0.10

> Multi-threaded cumulative aggregates
> 
>
> Key: SYSTEMML-396
> URL: https://issues.apache.org/jira/browse/SYSTEMML-396
> Project: SystemML
>  Issue Type: Task
>  Components: Compiler, Runtime
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 0.10
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (SYSTEMML-396) Multi-threaded cumulative aggregates

2016-03-27 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm reassigned SYSTEMML-396:
---

Assignee: Matthias Boehm

> Multi-threaded cumulative aggregates
> 
>
> Key: SYSTEMML-396
> URL: https://issues.apache.org/jira/browse/SYSTEMML-396
> Project: SystemML
>  Issue Type: Task
>  Components: Compiler, Runtime
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
> Fix For: SystemML 0.10
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-560) Distributed frame representation

2016-03-27 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm updated SYSTEMML-560:

Description: 
The major design goals for our distributed binary block frame representation 
are twofold:
* Seamless integration: First, we aim for a seamless integration with (1) 
Spark's DataFrame and DataSet representations, (2) csv text formats, and  (3) 
SystemML's binary block matrix representations.
* Memory efficiency: Second, we are still interested in a block representation 
to exploit the column-wise native array storage of FrameBlocks.  


As a good compromise with regard to both design goals, the initial design 
proposal is 
{code}
FRAME := JavaPairRDD 
{code}
where the keys represents the row offsets of frameblock values, a frameblock 
value covers one or multiple rows and all columns of the frame, and most 
importantly frameblock values do not exhibit a fixed block size. NOTE that in 
comparison to Spark's data frames, SystemML's frames are row-indexed (no a set 
of rows) in order to allow well-defined indexing operations over frames (as 
possible in R).  

This representation would allow a shuffle-free conversion from DataFrames, 
DataSets, CSV to SystemML's Frames and vice versa while still exploiting a 
block structure whenever possible (moderate numbers of columns). Similar, 
binary block matrix to frame conversions can also be done without shuffle in 
the common case ncol <= blocksize (default 1k). Finally, this representation 
also seems to be advantageous with regard to the common frame operations of 
transform, transform apply, indexing, append, and transform decode.

  was:
The major design goals for our distributed binary block frame representation 
are twofold:
* Seamless integration: First, we aim for a seamless integration with (1) 
Spark's DataFrame and DataSet representations, (2) csv text formats, and  (3) 
SystemML's binary block matrix representations.
* Memory efficiency: Second, we are still interested in a block representation 
to exploit the column-wise native array storage of FrameBlocks.  


As a good compromise with regard to both design goals, the initial design 
proposal is 
{code}
FRAME := JavaPairRDD 
{code}
where the keys represents the row offsets of frameblock values, a frameblock 
value covers one or multiple rows and all columns of the frame, and most 
importantly frameblock values do not exhibit a fixed block size.

This representation would allow a shuffle-free conversion from DataFrames, 
DataSets, CSV to SystemML's Frames and vice versa while still exploiting a 
block structure whenever possible (moderate numbers of columns). Similar, 
binary block matrix to frame conversions can also be done without shuffle in 
the common case ncol <= blocksize (default 1k). Finally, this representation 
also seems to be advantageous with regard to the common frame operations of 
transform, transform apply, indexing, append, and transform decode.


> Distributed frame representation
> 
>
> Key: SYSTEMML-560
> URL: https://issues.apache.org/jira/browse/SYSTEMML-560
> Project: SystemML
>  Issue Type: Task
>Reporter: Matthias Boehm
>Assignee: Arvind Surve
>
> The major design goals for our distributed binary block frame representation 
> are twofold:
> * Seamless integration: First, we aim for a seamless integration with (1) 
> Spark's DataFrame and DataSet representations, (2) csv text formats, and  (3) 
> SystemML's binary block matrix representations.
> * Memory efficiency: Second, we are still interested in a block 
> representation to exploit the column-wise native array storage of 
> FrameBlocks.  
> As a good compromise with regard to both design goals, the initial design 
> proposal is 
> {code}
> FRAME := JavaPairRDD 
> {code}
> where the keys represents the row offsets of frameblock values, a frameblock 
> value covers one or multiple rows and all columns of the frame, and most 
> importantly frameblock values do not exhibit a fixed block size. NOTE that in 
> comparison to Spark's data frames, SystemML's frames are row-indexed (no a 
> set of rows) in order to allow well-defined indexing operations over frames 
> (as possible in R).  
> This representation would allow a shuffle-free conversion from DataFrames, 
> DataSets, CSV to SystemML's Frames and vice versa while still exploiting a 
> block structure whenever possible (moderate numbers of columns). Similar, 
> binary block matrix to frame conversions can also be done without shuffle in 
> the common case ncol <= blocksize (default 1k). Finally, this representation 
> also seems to be advantageous with regard to the common frame operations of 
> transform, transform apply, indexing, append, and transform decode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (SYSTEMML-602) Investigation internal use of Dataset

2016-03-27 Thread Matthias Boehm (JIRA)
Matthias Boehm created SYSTEMML-602:
---

 Summary: Investigation internal use of Dataset
 Key: SYSTEMML-602
 URL: https://issues.apache.org/jira/browse/SYSTEMML-602
 Project: SystemML
  Issue Type: Sub-task
Reporter: Matthias Boehm






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (SYSTEMML-601) Data converters frame to/from binary block matrices

2016-03-27 Thread Matthias Boehm (JIRA)
Matthias Boehm created SYSTEMML-601:
---

 Summary: Data converters frame to/from binary block matrices
 Key: SYSTEMML-601
 URL: https://issues.apache.org/jira/browse/SYSTEMML-601
 Project: SystemML
  Issue Type: Sub-task
Reporter: Matthias Boehm






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (SYSTEMML-599) Data converters frame to/frame spark dataset/dataframe

2016-03-27 Thread Matthias Boehm (JIRA)
Matthias Boehm created SYSTEMML-599:
---

 Summary: Data converters frame to/frame spark dataset/dataframe
 Key: SYSTEMML-599
 URL: https://issues.apache.org/jira/browse/SYSTEMML-599
 Project: SystemML
  Issue Type: Sub-task
Reporter: Matthias Boehm






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-600) Data converters frame to/from csv text format

2016-03-27 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm updated SYSTEMML-600:

Summary: Data converters frame to/from csv text format  (was: Data 
converters frame to/frome csv text format)

> Data converters frame to/from csv text format
> -
>
> Key: SYSTEMML-600
> URL: https://issues.apache.org/jira/browse/SYSTEMML-600
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Matthias Boehm
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-599) Data converters frame to/from spark dataset/dataframe

2016-03-27 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm updated SYSTEMML-599:

Summary: Data converters frame to/from spark dataset/dataframe  (was: Data 
converters frame to/frame spark dataset/dataframe)

> Data converters frame to/from spark dataset/dataframe
> -
>
> Key: SYSTEMML-599
> URL: https://issues.apache.org/jira/browse/SYSTEMML-599
> Project: SystemML
>  Issue Type: Sub-task
>Reporter: Matthias Boehm
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (SYSTEMML-600) Data converters frame to/frome csv text format

2016-03-27 Thread Matthias Boehm (JIRA)
Matthias Boehm created SYSTEMML-600:
---

 Summary: Data converters frame to/frome csv text format
 Key: SYSTEMML-600
 URL: https://issues.apache.org/jira/browse/SYSTEMML-600
 Project: SystemML
  Issue Type: Sub-task
Reporter: Matthias Boehm






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (SYSTEMML-598) Modifications binary frame CP readers/writers

2016-03-27 Thread Matthias Boehm (JIRA)
Matthias Boehm created SYSTEMML-598:
---

 Summary: Modifications binary frame CP readers/writers
 Key: SYSTEMML-598
 URL: https://issues.apache.org/jira/browse/SYSTEMML-598
 Project: SystemML
  Issue Type: Sub-task
Reporter: Matthias Boehm






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-560) Distributed frame representation

2016-03-27 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm updated SYSTEMML-560:

Description: 
The major design goals for our distributed binary block frame representation 
are twofold:
* Seamless integration: First, we aim for a seamless integration with (1) 
Spark's DataFrame and DataSet representations, (2) csv text formats, and  (3) 
SystemML's binary block matrix representations.
* Memory efficiency: Second, we are still interested in a block representation 
to exploit the column-wise native array storage of FrameBlocks.  


As a good compromise with regard to both design goals, the initial design 
proposal is 
{code}
FRAME := JavaPairRDD 
{code}
where the keys represents the row offsets of frameblock values, a frameblock 
value covers one or multiple rows and all columns of the frame, and most 
importantly frameblock values do not exhibit a fixed block size.

This representation would allow a shuffle-free conversion from DataFrames, 
DataSets, CSV to SystemML's Frames and vice versa while still exploiting a 
block structure whenever possible (moderate numbers of columns). Similar, 
binary block matrix to frame conversions can also be done without shuffle in 
the common case ncol <= blocksize (default 1k). Finally, this representation 
also seems to be advantageous with regard to the common frame operations of 
transform, transform apply, indexing, append, and transform decode.

  was:
The major design goals for our distributed binary block frame representation 
are twofold:
* Seamless integration: First, we aim for a seamless integration with (1) 
Spark's DataFrame and DataSet representations, (2) csv text formats, and  (3) 
SystemML's binary block matrix representations.
* Memory efficiency: Second, we are still interested in a block representation 
to exploit the column-wise native array storage of FrameBlocks.  

As a good compromise with regard to both design goals, the initial design 
proposal is 
{code}
FRAME := JavaPairRDD 
{code}
where the keys represents the row offsets of frameblock values, a frameblock 
value covers one or multiple rows and all columns of the frame, and most 
importantly frameblock values do not exhibit a fixed block size. 


> Distributed frame representation
> 
>
> Key: SYSTEMML-560
> URL: https://issues.apache.org/jira/browse/SYSTEMML-560
> Project: SystemML
>  Issue Type: Task
>Reporter: Matthias Boehm
>Assignee: Arvind Surve
>
> The major design goals for our distributed binary block frame representation 
> are twofold:
> * Seamless integration: First, we aim for a seamless integration with (1) 
> Spark's DataFrame and DataSet representations, (2) csv text formats, and  (3) 
> SystemML's binary block matrix representations.
> * Memory efficiency: Second, we are still interested in a block 
> representation to exploit the column-wise native array storage of 
> FrameBlocks.  
> As a good compromise with regard to both design goals, the initial design 
> proposal is 
> {code}
> FRAME := JavaPairRDD 
> {code}
> where the keys represents the row offsets of frameblock values, a frameblock 
> value covers one or multiple rows and all columns of the frame, and most 
> importantly frameblock values do not exhibit a fixed block size.
> This representation would allow a shuffle-free conversion from DataFrames, 
> DataSets, CSV to SystemML's Frames and vice versa while still exploiting a 
> block structure whenever possible (moderate numbers of columns). Similar, 
> binary block matrix to frame conversions can also be done without shuffle in 
> the common case ncol <= blocksize (default 1k). Finally, this representation 
> also seems to be advantageous with regard to the common frame operations of 
> transform, transform apply, indexing, append, and transform decode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-560) Distributed frame representation

2016-03-27 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm updated SYSTEMML-560:

Assignee: Arvind Surve

> Distributed frame representation
> 
>
> Key: SYSTEMML-560
> URL: https://issues.apache.org/jira/browse/SYSTEMML-560
> Project: SystemML
>  Issue Type: Task
>Reporter: Matthias Boehm
>Assignee: Arvind Surve
>
> The major design goals for our distributed binary block frame representation 
> are twofold:
> * Seamless integration: First, we aim for a seamless integration with (1) 
> Spark's DataFrame and DataSet representations, (2) csv text formats, and  (3) 
> SystemML's binary block matrix representations.
> * Memory efficiency: Second, we are still interested in a block 
> representation to exploit the column-wise native array storage of 
> FrameBlocks.  
> As a good compromise with regard to both design goals, the initial design 
> proposal is 
> {code}
> FRAME := JavaPairRDD 
> {code}
> where the keys represents the row offsets of frameblock values, a frameblock 
> value covers one or multiple rows and all columns of the frame, and most 
> importantly frameblock values do not exhibit a fixed block size. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-560) Distributed frame representation

2016-03-27 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm updated SYSTEMML-560:

Description: 
The major design goals for our distributed binary block frame representation 
are twofold:
* Seamless integration: First, we aim for a seamless integration with (1) 
Spark's DataFrame and DataSet representations, (2) csv text formats, and  (3) 
SystemML's binary block matrix representations.
* Memory efficiency: Second, we are still interested in a block representation 
to exploit the column-wise native array storage of FrameBlocks.  

As a good compromise with regard to both design goals, the initial design 
proposal is 
{code}
FRAME := JavaPairRDD 
{code}
where the keys represents the row offsets of frameblock values, a frameblock 
value covers one or multiple rows and all columns of the frame, and most 
importantly frameblock values do not exhibit a fixed block size. 

  was:
The major design goals for our distributed binary block frame representation 
are twofold:
* Seamless integration: First, we aim for a seamless integration with (1) 
Spark's DataFrame and DataSet representations, (2) csv text formats, and  (3) 
SystemML's binary block matrix representations.
* Memory efficiency: Second, we are still interested in a block representation 
to exploit the column-wise native array storage of FrameBlocks.  

As a good compromise with regard to both design goals, the initial design 
proposal is 
{code}
FRAME := JavaPairRDD 
{code}
where the keys represents the row offsets of frameblock values, a frameblock 
value covers one or multiple rows and all columns of the frame, and most 
importantly frameblocks values do not exhibit a fixed block size. 


> Distributed frame representation
> 
>
> Key: SYSTEMML-560
> URL: https://issues.apache.org/jira/browse/SYSTEMML-560
> Project: SystemML
>  Issue Type: Task
>Reporter: Matthias Boehm
>
> The major design goals for our distributed binary block frame representation 
> are twofold:
> * Seamless integration: First, we aim for a seamless integration with (1) 
> Spark's DataFrame and DataSet representations, (2) csv text formats, and  (3) 
> SystemML's binary block matrix representations.
> * Memory efficiency: Second, we are still interested in a block 
> representation to exploit the column-wise native array storage of 
> FrameBlocks.  
> As a good compromise with regard to both design goals, the initial design 
> proposal is 
> {code}
> FRAME := JavaPairRDD 
> {code}
> where the keys represents the row offsets of frameblock values, a frameblock 
> value covers one or multiple rows and all columns of the frame, and most 
> importantly frameblock values do not exhibit a fixed block size. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-560) Distributed frame representation

2016-03-27 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm updated SYSTEMML-560:

Description: 
The major design goals for our distributed binary block frame representation 
are twofold:
* Seamless integration: First, we aim for a seamless integration with (1) 
Spark's DataFrame and DataSet representations, (2) csv text formats, and  (3) 
SystemML's binary block matrix representations.
* Memory efficiency: Second, we are still interested in a block representation 
to exploit the column-wise native array storage of FrameBlocks.  

As a good compromise with regard to both design goals, the initial design 
proposal is 
{code}
FRAME := JavaPairRDD 
{code}
where the keys represents the row offsets of frameblock values, a frameblock 
value covers one or multiple rows and all columns of the frame, and most 
importantly frameblocks values do not exhibit a fixed block size. 

> Distributed frame representation
> 
>
> Key: SYSTEMML-560
> URL: https://issues.apache.org/jira/browse/SYSTEMML-560
> Project: SystemML
>  Issue Type: Task
>Reporter: Matthias Boehm
>
> The major design goals for our distributed binary block frame representation 
> are twofold:
> * Seamless integration: First, we aim for a seamless integration with (1) 
> Spark's DataFrame and DataSet representations, (2) csv text formats, and  (3) 
> SystemML's binary block matrix representations.
> * Memory efficiency: Second, we are still interested in a block 
> representation to exploit the column-wise native array storage of 
> FrameBlocks.  
> As a good compromise with regard to both design goals, the initial design 
> proposal is 
> {code}
> FRAME := JavaPairRDD 
> {code}
> where the keys represents the row offsets of frameblock values, a frameblock 
> value covers one or multiple rows and all columns of the frame, and most 
> importantly frameblocks values do not exhibit a fixed block size. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (SYSTEMML-570) Consolidation buffer pool matrices and frames

2016-03-27 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm reassigned SYSTEMML-570:
---

Assignee: Matthias Boehm

> Consolidation buffer pool matrices and frames
> -
>
> Key: SYSTEMML-570
> URL: https://issues.apache.org/jira/browse/SYSTEMML-570
> Project: SystemML
>  Issue Type: Task
>  Components: Runtime
>Reporter: Matthias Boehm
>Assignee: Matthias Boehm
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (SYSTEMML-533) Error when print consecutive pound signs

2016-03-27 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm resolved SYSTEMML-533.
-
   Resolution: Fixed
Fix Version/s: SystemML 0.10

> Error when print consecutive pound signs
> 
>
> Key: SYSTEMML-533
> URL: https://issues.apache.org/jira/browse/SYSTEMML-533
> Project: SystemML
>  Issue Type: Bug
>  Components: Parser
>Reporter: Deron Eriksson
>Assignee: Matthias Boehm
>Priority: Minor
> Fix For: SystemML 0.10
>
>
> Trying to print consecutive pound (#) signs results in an error:
> {code}
> print("#"); # this works
> print("##"); # this doesn't work
> print("###"); # this doesn't work
> print("#"+"#"); # this doesn't work
> a="##";
> print(a); # this doesn't work
> b="#";
> print(b+b); # this doesn't work
> {code}
> Error:
> {code}
> Caused by: java.lang.StringIndexOutOfBoundsException: String index out of 
> range: -12
>   at java.lang.String.substring(String.java:1937)
>   at 
> org.apache.sysml.lops.runtime.RunMRJobs.updateInstLabels(RunMRJobs.java:449)
>   at 
> org.apache.sysml.lops.runtime.RunMRJobs.updateLabels(RunMRJobs.java:478)
>   at 
> org.apache.sysml.runtime.instructions.cp.CPInstruction.preprocessInstruction(CPInstruction.java:81)
>   at 
> org.apache.sysml.runtime.controlprogram.ProgramBlock.executeSingleInstruction(ProgramBlock.java:306)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)