[jira] [Issue Comment Deleted] (SYSTEMML-542) Investigate direct integration of SystemML (DML) with Apache Zeppelin

2016-03-06 Thread Nakul Jindal (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nakul Jindal updated SYSTEMML-542:
--
Comment: was deleted

(was: Spark DML in Zeppelin)

> Investigate direct integration of SystemML (DML) with Apache Zeppelin
> -
>
> Key: SYSTEMML-542
> URL: https://issues.apache.org/jira/browse/SYSTEMML-542
> Project: SystemML
>  Issue Type: New Feature
>Reporter: Nakul Jindal
>Priority: Minor
> Attachments: poc-single-node-dml-on-zeppelin.png, spark_cell.png, 
> spark_cell_anotated.png, spark_dml_cell.png, spark_dml_cell_anotated.png, 
> spark_negloglik_cell.png, spark_negloglik_cell_anotated.png
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SYSTEMML-538) Decoding ID columns to string

2016-03-06 Thread Matthias Boehm (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15182509#comment-15182509
 ] 

Matthias Boehm commented on SYSTEMML-538:
-

yes it does [~niketanpansare] - I'll add all the details of the frame 
discussion with another JIRA, please focus here on how you guys want to support 
the apply-transform. 

> Decoding ID columns to string
> -
>
> Key: SYSTEMML-538
> URL: https://issues.apache.org/jira/browse/SYSTEMML-538
> Project: SystemML
>  Issue Type: New Feature
>  Components: APIs
>Reporter: Prithviraj Sen
>
> Currently, the transform operation allows one to consume a frame containing 
> strings and replace the strings with integer IDs. However, there is no 
> operation that provides the inverse of this functionality. In other words, it 
> would be nice to have an operation that allows one to use a recode map 
> produced by a previously invoked transform operation and replace the integer 
> IDs with the corresponding strings provided in the recode map.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (SYSTEMML-538) Decoding ID columns to string

2016-03-06 Thread Niketan Pansare (JIRA)

[ 
https://issues.apache.org/jira/browse/SYSTEMML-538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15182506#comment-15182506
 ] 

Niketan Pansare commented on SYSTEMML-538:
--

It is a useful feature to have. This brings an interesting question: Should we 
generalize frame or implement additional function for this operation ? 
transform(frame) => matrix
aboveMentionedOp(matrix) => frame or aboveMentionedOp(frame) => frame

> Decoding ID columns to string
> -
>
> Key: SYSTEMML-538
> URL: https://issues.apache.org/jira/browse/SYSTEMML-538
> Project: SystemML
>  Issue Type: New Feature
>  Components: APIs
>Reporter: Prithviraj Sen
>
> Currently, the transform operation allows one to consume a frame containing 
> strings and replace the strings with integer IDs. However, there is no 
> operation that provides the inverse of this functionality. In other words, it 
> would be nice to have an operation that allows one to use a recode map 
> produced by a previously invoked transform operation and replace the integer 
> IDs with the corresponding strings provided in the recode map.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-552) Performance features ALS-CG

2016-03-06 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm updated SYSTEMML-552:

Description: 
Over a spectrum of data sizes, ALS-CG does not always perform as good as we 
would expect due to unnecessary overheads. This task captures related 
performance features:

1) Cache-conscious sparse wdivmm left/right: For large factors, the approach of 
iterating through non-zeros in W and computing dot products, leads to repeated 
(unnecessary) scans of the factors from main-memory. 
2) Preparation sparse W = (X!=0) w/ intrinsics: For scalar operations with !=0, 
there is already a special case which is however unnecessarily conservative. We 
should realize this with a plain memcopy of indices and memset 1 for values.
3) Flop-aware operator selection QuaternaryOp: For large ranks, all quaternary 
operators become really compute-intensive. In these situations, our heuristic 
of choosing ExecType.CP if the operation fits in driver memory does not work 
very well. Hence, we should take the number of floating point operations and 
the local/cluster degree of parallelism into account when deciding for the 
execution type.  
4) Improved parallel read sparse binary block: Reading sparse binary block 
matrices with clen>bclen requires a global lock on append and final sequential 
sorting of sparse rows. We should use a more fine-grained locking scheme and 
sort sparse rows in parallel. 

  was:
Over a spectrum of data sizes, ALS-CG does always perform as good as we would 
expect due to unnecessary overheads. This task captures related performance 
features:

1) Cache-conscious sparse wdivmm left/right: For large factors, the approach of 
iterating through non-zeros in W and computing dot products, leads to repeated 
(unnecessary) scans of the factors from main-memory. 
2) Preparation sparse W = (X!=0) w/ intrinsics: For scalar operations with !=0, 
there is already a special case which is however unnecessarily conservative. We 
should realize this with a plain memcopy of indices and memset 1 for values.
3) Flop-aware operator selection QuaternaryOp: For large ranks, all quaternary 
operators become really compute-intensive. In these situations, our heuristic 
of choosing ExecType.CP if the operation fits in driver memory does not work 
very well. Hence, we should take the number of floating point operations and 
the local/cluster degree of parallelism into account when deciding for the 
execution type.  
4) Improved parallel read sparse binary block: Reading sparse binary block 
matrices with clen>bclen requires a global lock on append and final sequential 
sorting of sparse rows. We should use a more fine-grained locking scheme and 
sort sparse rows in parallel. 


> Performance features ALS-CG
> ---
>
> Key: SYSTEMML-552
> URL: https://issues.apache.org/jira/browse/SYSTEMML-552
> Project: SystemML
>  Issue Type: Task
>Reporter: Matthias Boehm
>
> Over a spectrum of data sizes, ALS-CG does not always perform as good as we 
> would expect due to unnecessary overheads. This task captures related 
> performance features:
> 1) Cache-conscious sparse wdivmm left/right: For large factors, the approach 
> of iterating through non-zeros in W and computing dot products, leads to 
> repeated (unnecessary) scans of the factors from main-memory. 
> 2) Preparation sparse W = (X!=0) w/ intrinsics: For scalar operations with 
> !=0, there is already a special case which is however unnecessarily 
> conservative. We should realize this with a plain memcopy of indices and 
> memset 1 for values.
> 3) Flop-aware operator selection QuaternaryOp: For large ranks, all 
> quaternary operators become really compute-intensive. In these situations, 
> our heuristic of choosing ExecType.CP if the operation fits in driver memory 
> does not work very well. Hence, we should take the number of floating point 
> operations and the local/cluster degree of parallelism into account when 
> deciding for the execution type.  
> 4) Improved parallel read sparse binary block: Reading sparse binary block 
> matrices with clen>bclen requires a global lock on append and final 
> sequential sorting of sparse rows. We should use a more fine-grained locking 
> scheme and sort sparse rows in parallel. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (SYSTEMML-552) Performance features ALS-CG

2016-03-06 Thread Matthias Boehm (JIRA)

 [ 
https://issues.apache.org/jira/browse/SYSTEMML-552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias Boehm updated SYSTEMML-552:

Description: 
Over a spectrum of data sizes, ALS-CG does always perform as good as we would 
expect due to unnecessary overheads. This task captures related performance 
features:

1) Cache-conscious sparse wdivmm left/right: For large factors, the approach of 
iterating through non-zeros in W and computing dot products, leads to repeated 
(unnecessary) scans of the factors from main-memory. 
2) Preparation sparse W = (X!=0) w/ intrinsics: For scalar operations with !=0, 
there is already a special case which is however unnecessarily conservative. We 
should realize this with a plain memcopy of indices and memset 1 for values.
3) Flop-aware operator selection QuaternaryOp: For large ranks, all quaternary 
operators become really compute-intensive. In these situations, our heuristic 
of choosing ExecType.CP if the operation fits in driver memory does not work 
very well. Hence, we should take the number of floating point operations and 
the local/cluster degree of parallelism into account when deciding for the 
execution type.  

> Performance features ALS-CG
> ---
>
> Key: SYSTEMML-552
> URL: https://issues.apache.org/jira/browse/SYSTEMML-552
> Project: SystemML
>  Issue Type: Task
>Reporter: Matthias Boehm
>
> Over a spectrum of data sizes, ALS-CG does always perform as good as we would 
> expect due to unnecessary overheads. This task captures related performance 
> features:
> 1) Cache-conscious sparse wdivmm left/right: For large factors, the approach 
> of iterating through non-zeros in W and computing dot products, leads to 
> repeated (unnecessary) scans of the factors from main-memory. 
> 2) Preparation sparse W = (X!=0) w/ intrinsics: For scalar operations with 
> !=0, there is already a special case which is however unnecessarily 
> conservative. We should realize this with a plain memcopy of indices and 
> memset 1 for values.
> 3) Flop-aware operator selection QuaternaryOp: For large ranks, all 
> quaternary operators become really compute-intensive. In these situations, 
> our heuristic of choosing ExecType.CP if the operation fits in driver memory 
> does not work very well. Hence, we should take the number of floating point 
> operations and the local/cluster degree of parallelism into account when 
> deciding for the execution type.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (SYSTEMML-552) Performance features ALS-CG

2016-03-06 Thread Matthias Boehm (JIRA)
Matthias Boehm created SYSTEMML-552:
---

 Summary: Performance features ALS-CG
 Key: SYSTEMML-552
 URL: https://issues.apache.org/jira/browse/SYSTEMML-552
 Project: SystemML
  Issue Type: Task
Reporter: Matthias Boehm






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)