[ 
https://issues.apache.org/jira/browse/SPARK-23266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16348310#comment-16348310
 ] 

Chandan Misra commented on SPARK-23266:
---------------------------------------

*How big is n typically for your use case?*
To give a glimpse of how enormous data is used in Kriging, the following paper 
might interests you
 [http://www.tandfonline.com/doi/full/10.1080/2150704X.2016.1275053]
The number of points here is 650 million and the size is 18 GB. I think the 
inversion of variance-covariance matrix C is impossible if it is considered to 
be processed locally.

*I'm also not clear how common this operation is?*

Kriging is used extensively in many fields like earth science, mining, weather 
prediction, wireless sensor networks, remote sensing applications like filling 
gaps in satellite raster images, creating Digital Elevation Model from LiDAR 
data to name a few and backed by a large number of research papers. There are 
separate R packages which are implemented solely for Kriging, like gstat, geoR 
etc. But these are limited to a single node and fail when a large dataset is 
fed to the system.

Additionally, there have been researches (like 
[this|https://www.spiedigitallibrary.org/journals/Journal-of-Applied-Remote-Sensing/volume-11/issue-1/016011/High-performance-parallel-approaches-for-three-dimensional-light-detection-and/10.1117/1.JRS.11.016011.short?SSO=1])
 going on for parallelizing Kriging in MPI, Hadoop, GPU. One of the teams is 
[GIST at Oak Ridge national 
laboratory|http://web.ornl.gov/sci/gist/res_high_performance.shtml], performing 
geo-computation in HPC setup. I think Spark can easily substitute others for 
its benefits in this regard. Thus, as a core processing component of Kriging, 
matrix inversion is highly relevant and a spark implementation will provide a 
hassle-free solution to a large fraction of the non-computer science 
researchers.

> Matrix Inversion on BlockMatrix
> -------------------------------
>
>                 Key: SPARK-23266
>                 URL: https://issues.apache.org/jira/browse/SPARK-23266
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>    Affects Versions: 2.2.1
>            Reporter: Chandan Misra
>            Priority: Minor
>
> Matrix inversion is the basic building block for many other algorithms like 
> regression, classification, geostatistical analysis using ordinary kriging 
> etc. A simple Spark BlockMatrix based efficient distributed 
> divide-and-conquer algorithm can be implemented using only *6* 
> multiplications in each recursion level of the algorithm. The reference paper 
> can be found in
> [https://arxiv.org/abs/1801.04723]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to