[jira] [Commented] (LUCENE-7745) Explore GPU acceleration

2019-07-03 Thread Rinka Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16877934#comment-16877934
 ] 

Rinka Singh commented on LUCENE-7745:
-

{quote}The basic idea is to compute sub-histograms in each thread block with 
each thread block accumulating into the local memory. Then, when each thread 
block finishes its workload, it atomically adds the result to global memory, 
reducing the overall amount of traffic to global memory.To increase throughput 
and reduce shared memory contention, the main contribution here is that they 
actually use R "replicated" sub-histograms in each thread block, and they 
offset them so that bin 0 of the 1st histogram falls into a different memory 
bank than bin 0 of the 2nd histogram, and so on for R histograms. Essentially, 
it improves throughput in the degenerate case where multiple threads are trying 
to accumulate the same histogram bin at the same time.
{quote}
So here's what I've done/am doing:

I have a basic histogramming (including eliminating stop words) working on a 
single GPU  (I have an old Quadro 2000 with 1 GB memory) - I've tested it for a 
5MB (text file) and it seems to be working OK.

The following is how I'm implementing it - briefly.

Read a file in from command line (linux executable) into the GPU
 * convert the stream to words, chunk them into blocks
 * eliminate the stop words
 * sort/merge (including word-count) everything first inside a block and then 
across blocks - I came up with my own sort - haven't had the time to explore 
the parallel sorts out there
 * This results in a sorted histogram is held in multiple blocks in the GPU.

The advantages of this approach (to my mind) are:
 * i can scale up use the entire GPU memory.  My guess is I can create and 
manage an 8-10 GB index in a V100 (it has 32GB) - like I said, I've only tested 
with a 5 MB text file so far.
 * Easy to add fresh data into the existing histogram.  All I need to do is 
create new blocks and sort/merge them all.
 * I'm guessing this should make it easy to implement scaling across GPUs which 
means on a multi-GPU machine, I can scale to the almost the number of GPUs 
there and then of course one can setup a cluster of such machines...  This is 
far in the future though...
 * The sort is kept separate so we can experiment with various sorts and see 
which one performs best.

 

The issues are:
 * It is currently horrendously slow (I use global memory all the way and no 
optimization).  Well OK much too slow for my liking (I went over to nVidia's 
office and tested it on a K80 and it was just twice as fast as my GPU).  I'm 
currently trying to implement a shared memory version (and a few other tweaks) 
that should speed it up.
 * I have yet to do comparisons with the histogramming tools out there and so 
cannot say how much better it is.  Once I have the basic inverted index in 
place, I'll reach out to you all for the testing.
 * It is still a bit fragile - I'm still finding bugs as I test but the basic 
works.

 

Currently in process:
 * code is modified for (some) performance.  Am debugging/testing - it will 
take a while.  As of now, I feel good about what I've done but I won't know 
till I test for performance.
 * Need to add ability to handle multiple files (I think I will postpone this 
as one can always cat the files together and pass it in - that is a pretty 
simple script that can be wrapped around the executable).
 * Need to create inverted index.
 * we'll worry about searching on the index later but that should be pretty 
trivial - well actually nothing is trivial here.

 
{quote}Re: efficient histogram implementation in CUDA

If it helps, [this 
approach|https://scholar.google.com/scholar?cluster=4154868272073145366&hl=en&as_sdt=0,3]
 has been good for a balance between GPU performance and ease of implementation 
for work I've done in the past. If academic paywalls block you for all those 
results, it looks to also be available (presumably by the authors) on 
[researchgate|https://www.researchgate.net/publication/256674650_An_optimized_approach_to_histogram_computation_on_GPU]
{quote}
 Took a quick look - they are all priced products.  I will take a look at 
researchgate sometime.

I apologize but I may not be very responsive in the next month or so as we are 
in the middle of a release at work and also my night time job (this).

> Explore GPU acceleration
> 
>
> Key: LUCENE-7745
> URL: https://issues.apache.org/jira/browse/LUCENE-7745
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Major
>  Labels: gsoc2017, mentor
> Attachments: TermDisjunctionQuery.java, gpu-benchmarks.png
>
>
> There are parts of Lucene that can potentially be speeded up if comp

[jira] [Commented] (LUCENE-7745) Explore GPU acceleration

2019-07-02 Thread Joshua Mack (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16877165#comment-16877165
 ] 

Joshua Mack commented on LUCENE-7745:
-

Sounds good!

Re: efficient histogram implementation in CUDA

If it helps, [this 
approach|https://scholar.google.com/scholar?cluster=4154868272073145366&hl=en&as_sdt=0,3]
 has been good for a balance between GPU performance and ease of implementation 
for work I've done in the past. If academic paywalls block you for all those 
results, it looks to also be available (presumably by the authors) on 
[researchgate|https://www.researchgate.net/publication/256674650_An_optimized_approach_to_histogram_computation_on_GPU]

The basic idea is to compute sub-histograms in each thread block with each 
thread block accumulating into the local memory. Then, when each thread block 
finishes its workload, it atomically adds the result to global memory, reducing 
the overall amount of traffic to global memory.

To increase throughput and reduce shared memory contention, the main 
contribution here is that they actually use R "replicated" sub-histograms in 
each thread block, and they offset them so that bin 0 of the 1st histogram 
falls into a different memory bank than bin 0 of the 2nd histogram, and so on 
for R histograms. Essentially, it improves throughput in the degenerate case 
where multiple threads are trying to accumulate the same histogram bin at the 
same time.

> Explore GPU acceleration
> 
>
> Key: LUCENE-7745
> URL: https://issues.apache.org/jira/browse/LUCENE-7745
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Major
>  Labels: gsoc2017, mentor
> Attachments: TermDisjunctionQuery.java, gpu-benchmarks.png
>
>
> There are parts of Lucene that can potentially be speeded up if computations 
> were to be offloaded from CPU to the GPU(s). With commodity GPUs having as 
> high as 12GB of high bandwidth RAM, we might be able to leverage GPUs to 
> speed parts of Lucene (indexing, search).
> First that comes to mind is spatial filtering, which is traditionally known 
> to be a good candidate for GPU based speedup (esp. when complex polygons are 
> involved). In the past, Mike McCandless has mentioned that "both initial 
> indexing and merging are CPU/IO intensive, but they are very amenable to 
> soaking up the hardware's concurrency."
> I'm opening this issue as an exploratory task, suitable for a GSoC project. I 
> volunteer to mentor any GSoC student willing to work on this this summer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7745) Explore GPU acceleration

2019-06-14 Thread Rinka Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16863938#comment-16863938
 ] 

Rinka Singh commented on LUCENE-7745:
-

Hi [~mackncheesiest], All,
 A quick update. I've been going really slow (sorry 'bout that). My day job has 
consumed a lot of my time. Also, what I have working is histogramming (on text 
files) on GPUs - the problem with that is that it is horrendously slow - I use 
the GPU global memory all the way (it is just about 4-5 times faster than a 
CPU) instead of sorting in local memory. I've been trying to accelerate that 
before converting it into an inverted index. Nearly there :) you know how that 
is - the almost there syndrome... Once I get it done, I'll check it into my 
github.

Here's the lessons I learned in my journey:
 # Do all the decision making in the CPU.  See if parallelization can 
substitute for decision making - you need to think parallelization/optimization 
as part of design not as an after-thought - this is counter to what everyone 
says about optimization.  The reason is there could be SIGNIFICANT design 
changes.
 # Do as fine grained parallelism as possible.  Don't think - one cpu-thread == 
one gpu-thread.  think as parallel as possible.
 # The best metaphor I found (of working with GPUs) - think of it as an 
embedded board attached to your machine and you move data to and fro from the 
board, debug on the board.  Dump all parallel processing on the board and 
sequential on the CPU.
 # Read the nVidia Manuals (they are your best bet).  I figured it is better to 
stay with CUDA (as against OpenCL) given the wealth of CUDA info and support 
out there...
 # Writing code:
 ## explicitly think about cache memory (that's your shared memory) and 
registers (local memory) and manage them.  This is COMPLETELY different from 
writing CPU code - the compiler does this for you.
 ## Try to used const, shared and register memory as much as possible.  Avoid 
__syncthreads () if you can.
 ## Here's where the dragons lie...
 # Engineer productivity is roughly 1/10th the normal productivity (And I mean 
writing C/C++ not python).  I've written and thrown away code umpteen times - 
something that I just wouldn't need to do when writing standard code.

Having said all this, :) I've a bunch of limitations that a regular software 
engineer will not have and have been struggling to get over them.  I've been a 
manager for way too long and find it really difficult to focus on just one 
thing (the standard ADHD that most managers eventually develop).  Also, I WAS a 
C programmer, loong ago - no, not even C++ and I just haven't had the bandwidth 
to pick C++ up and then let's not even talk about my day job pressures - I do 
this for an hour or two at night (sigh)...

I will put out everything I've done once I've crossed a milestone (a working 
accelerated histogram).  Then will modify that to do inverted indexing.

Hope this helps...  In the meantime, if you want me to review 
documents/design/thoughts/anything, please feel free to mail them to me at: 
rinka (dot) singh (at) gmail.  At least ping me - I really don't look at 
the Apache messages and would probably miss something...

> Explore GPU acceleration
> 
>
> Key: LUCENE-7745
> URL: https://issues.apache.org/jira/browse/LUCENE-7745
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Major
>  Labels: gsoc2017, mentor
> Attachments: TermDisjunctionQuery.java, gpu-benchmarks.png
>
>
> There are parts of Lucene that can potentially be speeded up if computations 
> were to be offloaded from CPU to the GPU(s). With commodity GPUs having as 
> high as 12GB of high bandwidth RAM, we might be able to leverage GPUs to 
> speed parts of Lucene (indexing, search).
> First that comes to mind is spatial filtering, which is traditionally known 
> to be a good candidate for GPU based speedup (esp. when complex polygons are 
> involved). In the past, Mike McCandless has mentioned that "both initial 
> indexing and merging are CPU/IO intensive, but they are very amenable to 
> soaking up the hardware's concurrency."
> I'm opening this issue as an exploratory task, suitable for a GSoC project. I 
> volunteer to mentor any GSoC student willing to work on this this summer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7745) Explore GPU acceleration

2019-03-18 Thread Joshua Mack (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16794786#comment-16794786
 ] 

Joshua Mack commented on LUCENE-7745:
-

Hello all,

There's a chance that I may end up being a lucky graduate student tasked with 
handling something similar to this a few months down the line with application 
to a largely textual Kafka+Logstash+Elastic pipeline

[~ichattopadhyaya], I've noticed that the Github links above/referenced in the 
Activate 2018 talk now lead to 404s – I understand that it's not intended to be 
taken as the defacto method by which functionality like this should be added to 
Lucene, but having a baseline to look into with some of our investigations 
would certainly be helpful and could save us (myself) some time. If you'd be 
willing to rehost somewhere or provide updated results, I would certainly 
appreciate it

Similarly, [~rinka], coming from a naive outsider, I think your plan sounds 
good, and if you have any preliminary investigations you've performed, I would 
appreciate being able to learn from your lessons thus far. From my perspective, 
I have experience with GPUs and Solr separately, but I've never dealt with them 
together (or with Lucene directly for that matter)

Finally, we're still in a very investigative phase and I'm certainly not the 
PI, but assuming this ends up being relevant to our project, I'm sure we would 
be happy to contribute back whatever we end up doing as a part of the project

> Explore GPU acceleration
> 
>
> Key: LUCENE-7745
> URL: https://issues.apache.org/jira/browse/LUCENE-7745
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Major
>  Labels: gsoc2017, mentor
> Attachments: TermDisjunctionQuery.java, gpu-benchmarks.png
>
>
> There are parts of Lucene that can potentially be speeded up if computations 
> were to be offloaded from CPU to the GPU(s). With commodity GPUs having as 
> high as 12GB of high bandwidth RAM, we might be able to leverage GPUs to 
> speed parts of Lucene (indexing, search).
> First that comes to mind is spatial filtering, which is traditionally known 
> to be a good candidate for GPU based speedup (esp. when complex polygons are 
> involved). In the past, Mike McCandless has mentioned that "both initial 
> indexing and merging are CPU/IO intensive, but they are very amenable to 
> soaking up the hardware's concurrency."
> I'm opening this issue as an exploratory task, suitable for a GSoC project. I 
> volunteer to mentor any GSoC student willing to work on this this summer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7745) Explore GPU acceleration

2018-12-16 Thread Rinka Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16722717#comment-16722717
 ] 

Rinka Singh commented on LUCENE-7745:
-

Thank you.
As a first step let me come up with a GPU based index builder and we can look 
at query handling on the GPU as a second step.
:-) I AM going to be slooow - my apologies but I'm interested enough to put 
effort into this and will do my best.

Here's what I'll do as a first step:
Develop a stand alone executable (we can figure out modifications to directly 
use in Lucene as step 1.1) that will:
a. Read multiple files (command line) and come up with an inverted index
b. write the inverted index to stdout (I can generate a lucene index as step 
1.1)
c. will handle a stop-words file as a command line param
d. Will work on one GPU+1 thread of the CPU (I'll keep multi-GPU and multi 
threading to use all CPUs in mind but implementing that will be a separate step 
altogether).

Goal: Look at speed difference between an Index generated on the CPU vs GPU for 
just this.

We can build from there...
Thoughts please...

> Explore GPU acceleration
> 
>
> Key: LUCENE-7745
> URL: https://issues.apache.org/jira/browse/LUCENE-7745
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Major
>  Labels: gsoc2017, mentor
> Attachments: TermDisjunctionQuery.java, gpu-benchmarks.png
>
>
> There are parts of Lucene that can potentially be speeded up if computations 
> were to be offloaded from CPU to the GPU(s). With commodity GPUs having as 
> high as 12GB of high bandwidth RAM, we might be able to leverage GPUs to 
> speed parts of Lucene (indexing, search).
> First that comes to mind is spatial filtering, which is traditionally known 
> to be a good candidate for GPU based speedup (esp. when complex polygons are 
> involved). In the past, Mike McCandless has mentioned that "both initial 
> indexing and merging are CPU/IO intensive, but they are very amenable to 
> soaking up the hardware's concurrency."
> I'm opening this issue as an exploratory task, suitable for a GSoC project. I 
> volunteer to mentor any GSoC student willing to work on this this summer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7745) Explore GPU acceleration

2018-11-29 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16703835#comment-16703835
 ] 

Adrien Grand commented on LUCENE-7745:
--

bq. How critical is the inverted index to the user experience?

Completely: almost all queries run on the inverted index. Unlike other 
datastores that run queries via linear scans and allow to speed things up by 
building indices, Lucene only enables querying vian an index.

bq. What happens if the inverted index is speeded up?

Then most queries get a speed up too.

bq. How many AWS instances would usually be used for searching through ~140GB 
sized inverted index

Hard to tell, it depends on your search load, how expensive queries are, etc.

> Explore GPU acceleration
> 
>
> Key: LUCENE-7745
> URL: https://issues.apache.org/jira/browse/LUCENE-7745
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Major
>  Labels: gsoc2017, mentor
> Attachments: TermDisjunctionQuery.java, gpu-benchmarks.png
>
>
> There are parts of Lucene that can potentially be speeded up if computations 
> were to be offloaded from CPU to the GPU(s). With commodity GPUs having as 
> high as 12GB of high bandwidth RAM, we might be able to leverage GPUs to 
> speed parts of Lucene (indexing, search).
> First that comes to mind is spatial filtering, which is traditionally known 
> to be a good candidate for GPU based speedup (esp. when complex polygons are 
> involved). In the past, Mike McCandless has mentioned that "both initial 
> indexing and merging are CPU/IO intensive, but they are very amenable to 
> soaking up the hardware's concurrency."
> I'm opening this issue as an exploratory task, suitable for a GSoC project. I 
> volunteer to mentor any GSoC student willing to work on this this summer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7745) Explore GPU acceleration

2018-11-28 Thread Rinka Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16702135#comment-16702135
 ] 

Rinka Singh commented on LUCENE-7745:
-

A few questions.  How critical is the inverted index to the user experience?  
What happens if the inverted index is speeded up?

> Explore GPU acceleration
> 
>
> Key: LUCENE-7745
> URL: https://issues.apache.org/jira/browse/LUCENE-7745
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Major
>  Labels: gsoc2017, mentor
> Attachments: TermDisjunctionQuery.java, gpu-benchmarks.png
>
>
> There are parts of Lucene that can potentially be speeded up if computations 
> were to be offloaded from CPU to the GPU(s). With commodity GPUs having as 
> high as 12GB of high bandwidth RAM, we might be able to leverage GPUs to 
> speed parts of Lucene (indexing, search).
> First that comes to mind is spatial filtering, which is traditionally known 
> to be a good candidate for GPU based speedup (esp. when complex polygons are 
> involved). In the past, Mike McCandless has mentioned that "both initial 
> indexing and merging are CPU/IO intensive, but they are very amenable to 
> soaking up the hardware's concurrency."
> I'm opening this issue as an exploratory task, suitable for a GSoC project. I 
> volunteer to mentor any GSoC student willing to work on this this summer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7745) Explore GPU acceleration

2018-11-28 Thread Rinka Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701990#comment-16701990
 ] 

Rinka Singh commented on LUCENE-7745:
-

[~jpountz]
{quote}(Unrelated to your comment Rinka, but seeing activity on this issue 
reminded me that I wanted to share something) There are limited use-cases for 
GPU accelelation in Lucene due to the fact that query processing is full of 
branches, especially since we added support for impacts and WAND.{quote}

While Yes branches do impact the performance, well designed (GPU) code will 
consist of a combo of both CPU (the decision making part) and GPU code.  For 
example, I wrote a histogram as a test case that saw SIGNIFICANT acceleration 
and I also identified further code areas that can be improved.  I'm fairly sure 
(gut feel), I can squeeze out a 40-50x kind of improvement at the very least on 
a mid-sized GPU (given the time etc.,). I think things will be much, much 
better on a high end GPU and with further scale-up on a multi-gpu system...

Incidentally, this is why I want to develop a library that I can put out there 
for integration.

{quote}That said Mike initially mentioned that BooleanScorer might be one 
scorer that could benefit from GPU acceleration as it scores large blocks of 
documents at once. I just attached a specialization of a disjunction over term 
queries that should make it easy to experiment with Cuda, see the TODO in the 
end on top of the computeScores method.
{quote}

Lucene is really new to me (and so is working with Apache - sorry, I am a 
newbie to Apache) :). Please will you put links here...  

> Explore GPU acceleration
> 
>
> Key: LUCENE-7745
> URL: https://issues.apache.org/jira/browse/LUCENE-7745
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Major
>  Labels: gsoc2017, mentor
> Attachments: TermDisjunctionQuery.java, gpu-benchmarks.png
>
>
> There are parts of Lucene that can potentially be speeded up if computations 
> were to be offloaded from CPU to the GPU(s). With commodity GPUs having as 
> high as 12GB of high bandwidth RAM, we might be able to leverage GPUs to 
> speed parts of Lucene (indexing, search).
> First that comes to mind is spatial filtering, which is traditionally known 
> to be a good candidate for GPU based speedup (esp. when complex polygons are 
> involved). In the past, Mike McCandless has mentioned that "both initial 
> indexing and merging are CPU/IO intensive, but they are very amenable to 
> soaking up the hardware's concurrency."
> I'm opening this issue as an exploratory task, suitable for a GSoC project. I 
> volunteer to mentor any GSoC student willing to work on this this summer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7745) Explore GPU acceleration

2018-11-28 Thread Rinka Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701976#comment-16701976
 ] 

Rinka Singh commented on LUCENE-7745:
-

> The code is not worth a patch right now, but will soon have something. I 
> shall update on the latest state
> here as soon as I find myself some time (winding down from a hectic Black 
> Friday/Cyber Monday support schedule).

Do you think I could take a look at the code, I could do a quick review and 
perhaps add a bit of value.  I'm fine if the code is in dev state.

Would you have written up something to describe what you are doing?

> Explore GPU acceleration
> 
>
> Key: LUCENE-7745
> URL: https://issues.apache.org/jira/browse/LUCENE-7745
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Major
>  Labels: gsoc2017, mentor
> Attachments: TermDisjunctionQuery.java, gpu-benchmarks.png
>
>
> There are parts of Lucene that can potentially be speeded up if computations 
> were to be offloaded from CPU to the GPU(s). With commodity GPUs having as 
> high as 12GB of high bandwidth RAM, we might be able to leverage GPUs to 
> speed parts of Lucene (indexing, search).
> First that comes to mind is spatial filtering, which is traditionally known 
> to be a good candidate for GPU based speedup (esp. when complex polygons are 
> involved). In the past, Mike McCandless has mentioned that "both initial 
> indexing and merging are CPU/IO intensive, but they are very amenable to 
> soaking up the hardware's concurrency."
> I'm opening this issue as an exploratory task, suitable for a GSoC project. I 
> volunteer to mentor any GSoC student willing to work on this this summer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7745) Explore GPU acceleration

2018-11-28 Thread Ishan Chattopadhyaya (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701951#comment-16701951
 ] 

Ishan Chattopadhyaya commented on LUCENE-7745:
--

bq. Your thoughts please...
Thanks for your interest in this. Seems like your proposed ideas are very much 
inline with our approach that we're trying out as well. There are some initial 
experiments and results that we are performing as we speak, and I can see that 
there are benefits in the niche usecases that Adrien mentioned.

> Explore GPU acceleration
> 
>
> Key: LUCENE-7745
> URL: https://issues.apache.org/jira/browse/LUCENE-7745
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Major
>  Labels: gsoc2017, mentor
> Attachments: TermDisjunctionQuery.java, gpu-benchmarks.png
>
>
> There are parts of Lucene that can potentially be speeded up if computations 
> were to be offloaded from CPU to the GPU(s). With commodity GPUs having as 
> high as 12GB of high bandwidth RAM, we might be able to leverage GPUs to 
> speed parts of Lucene (indexing, search).
> First that comes to mind is spatial filtering, which is traditionally known 
> to be a good candidate for GPU based speedup (esp. when complex polygons are 
> involved). In the past, Mike McCandless has mentioned that "both initial 
> indexing and merging are CPU/IO intensive, but they are very amenable to 
> soaking up the hardware's concurrency."
> I'm opening this issue as an exploratory task, suitable for a GSoC project. I 
> volunteer to mentor any GSoC student willing to work on this this summer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7745) Explore GPU acceleration

2018-11-28 Thread Ishan Chattopadhyaya (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701941#comment-16701941
 ] 

Ishan Chattopadhyaya commented on LUCENE-7745:
--

Hi Rinka, Kishore and I made some progress on this and presented our current 
state of this initiative here: https://www.youtube.com/watch?v=cY_4ApOAVJQ
The code is not worth a patch right now, but will soon have something. I shall 
update on the latest state here as soon as I find myself some time (winding 
down from a hectic Black Friday/Cyber Monday support schedule).


> Explore GPU acceleration
> 
>
> Key: LUCENE-7745
> URL: https://issues.apache.org/jira/browse/LUCENE-7745
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Major
>  Labels: gsoc2017, mentor
> Attachments: TermDisjunctionQuery.java, gpu-benchmarks.png
>
>
> There are parts of Lucene that can potentially be speeded up if computations 
> were to be offloaded from CPU to the GPU(s). With commodity GPUs having as 
> high as 12GB of high bandwidth RAM, we might be able to leverage GPUs to 
> speed parts of Lucene (indexing, search).
> First that comes to mind is spatial filtering, which is traditionally known 
> to be a good candidate for GPU based speedup (esp. when complex polygons are 
> involved). In the past, Mike McCandless has mentioned that "both initial 
> indexing and merging are CPU/IO intensive, but they are very amenable to 
> soaking up the hardware's concurrency."
> I'm opening this issue as an exploratory task, suitable for a GSoC project. I 
> volunteer to mentor any GSoC student willing to work on this this summer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7745) Explore GPU acceleration

2018-11-28 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701879#comment-16701879
 ] 

Adrien Grand commented on LUCENE-7745:
--

(Unrelated to your comment Rinka, but seeing activity on this issue reminded me 
that I wanted to share something) There are limited use-cases for GPU 
accelelation in Lucene due to the fact that query processing is full of 
branches, especially since we added support for impacts and WAND. That said 
Mike initially mentioned that BooleanScorer might be one scorer that could 
benefit from GPU acceleration as it scores large blocks of documents at once. I 
just attached a specialization of a disjunction over term queries that should 
make it easy to experiment with Cuda, see the TODO in the end on top of the 
{{computeScores}} method.

> Explore GPU acceleration
> 
>
> Key: LUCENE-7745
> URL: https://issues.apache.org/jira/browse/LUCENE-7745
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Major
>  Labels: gsoc2017, mentor
> Attachments: TermDisjunctionQuery.java, gpu-benchmarks.png
>
>
> There are parts of Lucene that can potentially be speeded up if computations 
> were to be offloaded from CPU to the GPU(s). With commodity GPUs having as 
> high as 12GB of high bandwidth RAM, we might be able to leverage GPUs to 
> speed parts of Lucene (indexing, search).
> First that comes to mind is spatial filtering, which is traditionally known 
> to be a good candidate for GPU based speedup (esp. when complex polygons are 
> involved). In the past, Mike McCandless has mentioned that "both initial 
> indexing and merging are CPU/IO intensive, but they are very amenable to 
> soaking up the hardware's concurrency."
> I'm opening this issue as an exploratory task, suitable for a GSoC project. I 
> volunteer to mentor any GSoC student willing to work on this this summer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7745) Explore GPU acceleration

2018-11-28 Thread Rinka Singh (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16701855#comment-16701855
 ] 

Rinka Singh commented on LUCENE-7745:
-

Hi everyone,

I wanted to check if this issue was still open.  I have been experimenting with 
CUDA for a bit and would love to take a stab at this.

A few thoughts:
 * This is something I'll do over weekends and so I'm going to be horribly slow 
(its going to be just me on this unless you have someone working on it and I 
can collaborate with them) - would that be OK?
 * I think the right thing to do would be to build a CUDA library (C/C++), put 
JNI and then integrate it into Lucene.  If done right then I think this library 
will be useful to (and be possible to integrate with) other Analytic tools.
 * If I get it right, then I'd love to create an OS library that other OS tools 
can integrate and use (Yes, I'm thinking of an OpenCL port in the future but 
given the tools available in CUDA and my familiarity with it...)
 * Licensing is not an issue as I prefer the Apache License.
 * Testing (especially scalability testing) will be an issue - like you said, 
your setups won't have GPUs but would it be possible to rent a few GPU 
instances on the cloud (AWS, Google)?  I can do my dev testing locally as I 
have a GPU (its a  pretty old and obsolete one but good enough for my needs) on 
my dev machine.
 * It is important to get a few users who will experiment with this.  Can you 
guys help in having someone deploy, experiment and give feedback?
 * I would rather take something that is used by everyone and I'm thinking that 
indexing, filtering and searching is something that I would rather take up: 
[http://lucene.apache.org/core/7_5_0/demo/overview-summary.html#overview.description]
 ** These can certainly be accelerated.  I think I should be able to get some 
acceleration out of a GPU enabled search.
 ** The good part of this is one would able to scale volumes almost linearly on 
a multi-GPU machine.
 ** Related to the previous point (though this is in the future). I don't have 
a multi-GPU setup and will not be able to develop multi-GPU versions. I'll need 
help in getting the infrastructure to do that. We can talk about that once a 
single GPU version is done.
 ** Yes I agree that it will be better to have a separate library / classes 
doing this rather than directly integrating it into Lucene's class library.  
This suits me too as I can develop this as a separate library that other OS 
components can integrate and I can package this as part of nVidia's OS 
libraries.
 * I'm open to other alternatives - I scanned the ideas above but didn't 
consider them as they would not bring massive value to the users and I don't 
really want to experiment as I know what I'm doing.
 * Related to the previous point, I don't know Lucene (Help!! - do I really 
need to?) and will need support/hand-holding in terms of reviewing the 
identification/interfacing/design/code etc., etc.,
 * Finally, this IS GOING TO take time because thinking (and programming) 
massively parallel is completely different from writing a simple sequential 
search and sort.  How much time, think 7-10x at least given all my constraints.

If you guys like, I can write a brief (one or two paras) description of what is 
possible for indexing, searching, filtering (with zero knowledge of Lucene of 
course) to start off...

Your thoughts please...

> Explore GPU acceleration
> 
>
> Key: LUCENE-7745
> URL: https://issues.apache.org/jira/browse/LUCENE-7745
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Major
>  Labels: gsoc2017, mentor
> Attachments: gpu-benchmarks.png
>
>
> There are parts of Lucene that can potentially be speeded up if computations 
> were to be offloaded from CPU to the GPU(s). With commodity GPUs having as 
> high as 12GB of high bandwidth RAM, we might be able to leverage GPUs to 
> speed parts of Lucene (indexing, search).
> First that comes to mind is spatial filtering, which is traditionally known 
> to be a good candidate for GPU based speedup (esp. when complex polygons are 
> involved). In the past, Mike McCandless has mentioned that "both initial 
> indexing and merging are CPU/IO intensive, but they are very amenable to 
> soaking up the hardware's concurrency."
> I'm opening this issue as an exploratory task, suitable for a GSoC project. I 
> volunteer to mentor any GSoC student willing to work on this this summer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7745) Explore GPU acceleration for spatial search

2018-06-29 Thread Ishan Chattopadhyaya (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16527334#comment-16527334
 ] 

Ishan Chattopadhyaya commented on LUCENE-7745:
--

Ah, I think I wasn't clear on my intentions behind those numbers.
 
bq. if it brings any performance - I doubt that, because the call overhead 
between Java and CUDA is way too high - in contrast to Postgres where all in 
plain C/C++
I wanted to start with those experiments just to prove to myself that there are 
no significant overheads or bottlenecks (as we've feared in the past) and that 
there can be clear benefits to be realized.

I wanted to try bulk scoring, and chose the distance calculation and sorting as 
an example because (1) it leverages two fields, (2) it was fairly isolated & 
easy to try out.

In practical usecases of spatial search, the spatial filtering doesn't require 
score calculation & sorting on  the entire dataset (just those documents that 
are in the vicinity of the user point, filtered down by the geohash or bkd tree 
node); so in some sense I was trying out an absolute worst case of Lucene 
spatial search. 

Now, that I'm convinced that this overall approach works and overheads are low, 
I can now move on to looking at Lucene internals, maybe starting with scoring 
in general (BooleanScorer, for example). Other parts of Lucene/Solr that might 
see benefit could be streaming expressions (since they seem computation heavy), 
LTR re-ranking etc.


Actually incorporating all these benefits into Lucene would require 
considerable effort, and we can open subsequent JIRAs once we've had a chance 
to explore them separately. Till then, I'm inclined to keep this issue as a 
kitchen sink for all-things-GPU, if that makes sense?

> Explore GPU acceleration for spatial search
> ---
>
> Key: LUCENE-7745
> URL: https://issues.apache.org/jira/browse/LUCENE-7745
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/spatial-extras
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Major
>  Labels: gsoc2017, mentor
> Attachments: gpu-benchmarks.png
>
>
> There are parts of Lucene that can potentially be speeded up if computations 
> were to be offloaded from CPU to the GPU(s). With commodity GPUs having as 
> high as 12GB of high bandwidth RAM, we might be able to leverage GPUs to 
> speed parts of Lucene (indexing, search).
> First that comes to mind is spatial filtering, which is traditionally known 
> to be a good candidate for GPU based speedup (esp. when complex polygons are 
> involved). In the past, Mike McCandless has mentioned that "both initial 
> indexing and merging are CPU/IO intensive, but they are very amenable to 
> soaking up the hardware's concurrency."
> I'm opening this issue as an exploratory task, suitable for a GSoC project. I 
> volunteer to mentor any GSoC student willing to work on this this summer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7745) Explore GPU acceleration for spatial search

2018-06-28 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16526510#comment-16526510
 ] 

David Smiley commented on LUCENE-7745:
--

np.  Oh this caught me by surprise too!  I though this was about BooleanScorer 
or postings or something and then low and behold it's spatial -- and then I 
thought this is so non-obvious by the issue title.  So I thought it'd do a 
little JIRA gardening.

> Explore GPU acceleration for spatial search
> ---
>
> Key: LUCENE-7745
> URL: https://issues.apache.org/jira/browse/LUCENE-7745
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/spatial-extras
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Major
>  Labels: gsoc2017, mentor
> Attachments: gpu-benchmarks.png
>
>
> There are parts of Lucene that can potentially be speeded up if computations 
> were to be offloaded from CPU to the GPU(s). With commodity GPUs having as 
> high as 12GB of high bandwidth RAM, we might be able to leverage GPUs to 
> speed parts of Lucene (indexing, search).
> First that comes to mind is spatial filtering, which is traditionally known 
> to be a good candidate for GPU based speedup (esp. when complex polygons are 
> involved). In the past, Mike McCandless has mentioned that "both initial 
> indexing and merging are CPU/IO intensive, but they are very amenable to 
> soaking up the hardware's concurrency."
> I'm opening this issue as an exploratory task, suitable for a GSoC project. I 
> volunteer to mentor any GSoC student willing to work on this this summer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7745) Explore GPU acceleration for spatial search

2018-06-28 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16526506#comment-16526506
 ] 

Adrien Grand commented on LUCENE-7745:
--

Not sure why I confused names, I meant Ishan indeed. Sorry for that. I'll let 
Ishan decide how he wants to manage this issue, I'm personally fine either way, 
I'm mostly following. :) It just caught me by surprise since I was under the 
impression that we were still exploring which areas might benefit from GPU 
acceleration.

> Explore GPU acceleration for spatial search
> ---
>
> Key: LUCENE-7745
> URL: https://issues.apache.org/jira/browse/LUCENE-7745
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/spatial-extras
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Major
>  Labels: gsoc2017, mentor
> Attachments: gpu-benchmarks.png
>
>
> There are parts of Lucene that can potentially be speeded up if computations 
> were to be offloaded from CPU to the GPU(s). With commodity GPUs having as 
> high as 12GB of high bandwidth RAM, we might be able to leverage GPUs to 
> speed parts of Lucene (indexing, search).
> First that comes to mind is spatial filtering, which is traditionally known 
> to be a good candidate for GPU based speedup (esp. when complex polygons are 
> involved). In the past, Mike McCandless has mentioned that "both initial 
> indexing and merging are CPU/IO intensive, but they are very amenable to 
> soaking up the hardware's concurrency."
> I'm opening this issue as an exploratory task, suitable for a GSoC project. I 
> volunteer to mentor any GSoC student willing to work on this this summer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7745) Explore GPU acceleration for spatial search

2018-06-28 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16526486#comment-16526486
 ] 

David Smiley commented on LUCENE-7745:
--

Mark who?  You must mean Ishan?  I think that if GPUs are used to accelerate 
different things, then they would get separate issues and not be lumped under 
one issue.  Does that sound reasonable?  Granted the problem posted started off 
as a bit of an umbrella ticket and perhaps the particular proposal Ishan is 
presenting in his most recent comment ought to go in a new issue specific to 
spatial.Accelerating Haversine calculations sounds way different to me than 
BooleanScorer stuff; no?

> Explore GPU acceleration for spatial search
> ---
>
> Key: LUCENE-7745
> URL: https://issues.apache.org/jira/browse/LUCENE-7745
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/spatial-extras
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Major
>  Labels: gsoc2017, mentor
> Attachments: gpu-benchmarks.png
>
>
> There are parts of Lucene that can potentially be speeded up if computations 
> were to be offloaded from CPU to the GPU(s). With commodity GPUs having as 
> high as 12GB of high bandwidth RAM, we might be able to leverage GPUs to 
> speed parts of Lucene (indexing, search).
> First that comes to mind is spatial filtering, which is traditionally known 
> to be a good candidate for GPU based speedup (esp. when complex polygons are 
> involved). In the past, Mike McCandless has mentioned that "both initial 
> indexing and merging are CPU/IO intensive, but they are very amenable to 
> soaking up the hardware's concurrency."
> I'm opening this issue as an exploratory task, suitable for a GSoC project. I 
> volunteer to mentor any GSoC student willing to work on this this summer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7745) Explore GPU acceleration for spatial search

2018-06-28 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16526467#comment-16526467
 ] 

Adrien Grand commented on LUCENE-7745:
--

David, I'm not sure this was meant to be specific to lucene/spatial, Mark only 
mentioned it as a way to conduct an initial benchmark? The main thing that we 
identified as being a potential candidate for integration with Cuda is actually 
BooleanScorer (BS1, the one that does scoring in bulk) based on previous 
comments?

> Explore GPU acceleration for spatial search
> ---
>
> Key: LUCENE-7745
> URL: https://issues.apache.org/jira/browse/LUCENE-7745
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/spatial-extras
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Major
>  Labels: gsoc2017, mentor
> Attachments: gpu-benchmarks.png
>
>
> There are parts of Lucene that can potentially be speeded up if computations 
> were to be offloaded from CPU to the GPU(s). With commodity GPUs having as 
> high as 12GB of high bandwidth RAM, we might be able to leverage GPUs to 
> speed parts of Lucene (indexing, search).
> First that comes to mind is spatial filtering, which is traditionally known 
> to be a good candidate for GPU based speedup (esp. when complex polygons are 
> involved). In the past, Mike McCandless has mentioned that "both initial 
> indexing and merging are CPU/IO intensive, but they are very amenable to 
> soaking up the hardware's concurrency."
> I'm opening this issue as an exploratory task, suitable for a GSoC project. I 
> volunteer to mentor any GSoC student willing to work on this this summer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7745) Explore GPU acceleration

2018-06-27 Thread Ishan Chattopadhyaya (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16524890#comment-16524890
 ] 

Ishan Chattopadhyaya commented on LUCENE-7745:
--

Here [0] are some very initial experiments that I ran, along with Kishore 
Angani, a colleague at Unbxd.

1. Generic problem: Given a result set (of document hits) and a scoring 
function, return a sorted list of documents along with the computed scores.
2. Specific problem: Given (up to 11M) points and associated docids, compute 
the distance from a given query point. Return the sorted list of documents 
based on these distances.
3. GPU implementation based on Thrust library (C++ based Apache 2.0 licensed 
library), called from JNI wrapper. Timings include copying data (scores and 
sorted docids) back from GPU to host system and access from Java (via 
DirectByteBuffer).
4. CPU implementation was based on SpatialExample [1], which is perhaps not the 
fastest (points fields are better, I think).
5. Hardware: CPU is i7 5820k 4.3GHz (OC), 32GB RAM @ 2133MHz. GPU is Nvidia GTX 
1080, 11GB GDDR5 memory.

Results seem promising. The GPU is able to score 11M documents in ~50ms!. Here, 
blue is GPU and red is CPU (Lucene). 

!Screenshot from 2018-06-27 15-33-37.png!


[0] - https://github.com/chatman/gpu-benchmarks
[1] - 
https://github.com/apache/lucene-solr/blob/master/lucene/spatial-extras/src/test/org/apache/lucene/spatial/SpatialExample.java

> Explore GPU acceleration
> 
>
> Key: LUCENE-7745
> URL: https://issues.apache.org/jira/browse/LUCENE-7745
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>Priority: Major
>  Labels: gsoc2017, mentor
>
> There are parts of Lucene that can potentially be speeded up if computations 
> were to be offloaded from CPU to the GPU(s). With commodity GPUs having as 
> high as 12GB of high bandwidth RAM, we might be able to leverage GPUs to 
> speed parts of Lucene (indexing, search).
> First that comes to mind is spatial filtering, which is traditionally known 
> to be a good candidate for GPU based speedup (esp. when complex polygons are 
> involved). In the past, Mike McCandless has mentioned that "both initial 
> indexing and merging are CPU/IO intensive, but they are very amenable to 
> soaking up the hardware's concurrency."
> I'm opening this issue as an exploratory task, suitable for a GSoC project. I 
> volunteer to mentor any GSoC student willing to work on this this summer.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7745) Explore GPU acceleration

2017-04-03 Thread vikash (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15953766#comment-15953766
 ] 

vikash commented on LUCENE-7745:


oops i could not do that, i submitted my proposal by the way and if you check 
it now the latest edited format is the submitted version and sadly i could not 
change the github link, it only points to my home directory in github,  but can 
I start working still and I shall give you the link that has my working and if 
it would be possible for you , you could show to the Apache Software Foundation 
my works, will that be ok? 

> Explore GPU acceleration
> 
>
> Key: LUCENE-7745
> URL: https://issues.apache.org/jira/browse/LUCENE-7745
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>  Labels: gsoc2017, mentor
>
> There are parts of Lucene that can potentially be speeded up if computations 
> were to be offloaded from CPU to the GPU(s). With commodity GPUs having as 
> high as 12GB of high bandwidth RAM, we might be able to leverage GPUs to 
> speed parts of Lucene (indexing, search).
> First that comes to mind is spatial filtering, which is traditionally known 
> to be a good candidate for GPU based speedup (esp. when complex polygons are 
> involved). In the past, Mike McCandless has mentioned that "both initial 
> indexing and merging are CPU/IO intensive, but they are very amenable to 
> soaking up the hardware's concurrency."
> I'm opening this issue as an exploratory task, suitable for a GSoC project. I 
> volunteer to mentor any GSoC student willing to work on this this summer.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7745) Explore GPU acceleration

2017-04-03 Thread Ishan Chattopadhyaya (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15953673#comment-15953673
 ] 

Ishan Chattopadhyaya commented on LUCENE-7745:
--

If you still haven't submitted your proposal, I have an idea for you to improve 
your chances.
Include a link to a github repository in the application for your initial 
experiments.
After that, you can try to build a prototype in the next few days (until 
assessment starts) that demonstrates that you are on the right track. This is 
not strictly necessary, but just throwing out an idea that might benefit you.

All the best and regards!

> Explore GPU acceleration
> 
>
> Key: LUCENE-7745
> URL: https://issues.apache.org/jira/browse/LUCENE-7745
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>  Labels: gsoc2017, mentor
>
> There are parts of Lucene that can potentially be speeded up if computations 
> were to be offloaded from CPU to the GPU(s). With commodity GPUs having as 
> high as 12GB of high bandwidth RAM, we might be able to leverage GPUs to 
> speed parts of Lucene (indexing, search).
> First that comes to mind is spatial filtering, which is traditionally known 
> to be a good candidate for GPU based speedup (esp. when complex polygons are 
> involved). In the past, Mike McCandless has mentioned that "both initial 
> indexing and merging are CPU/IO intensive, but they are very amenable to 
> soaking up the hardware's concurrency."
> I'm opening this issue as an exploratory task, suitable for a GSoC project. I 
> volunteer to mentor any GSoC student willing to work on this this summer.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7745) Explore GPU acceleration

2017-04-03 Thread vikash (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15953579#comment-15953579
 ] 

vikash commented on LUCENE-7745:


It is my First GSOC and so it was a bit difficult for me to draft the proposal 
properly.

> Explore GPU acceleration
> 
>
> Key: LUCENE-7745
> URL: https://issues.apache.org/jira/browse/LUCENE-7745
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>  Labels: gsoc2017, mentor
>
> There are parts of Lucene that can potentially be speeded up if computations 
> were to be offloaded from CPU to the GPU(s). With commodity GPUs having as 
> high as 12GB of high bandwidth RAM, we might be able to leverage GPUs to 
> speed parts of Lucene (indexing, search).
> First that comes to mind is spatial filtering, which is traditionally known 
> to be a good candidate for GPU based speedup (esp. when complex polygons are 
> involved). In the past, Mike McCandless has mentioned that "both initial 
> indexing and merging are CPU/IO intensive, but they are very amenable to 
> soaking up the hardware's concurrency."
> I'm opening this issue as an exploratory task, suitable for a GSoC project. I 
> volunteer to mentor any GSoC student willing to work on this this summer.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7745) Explore GPU acceleration

2017-04-03 Thread Ishan Chattopadhyaya (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15953521#comment-15953521
 ] 

Ishan Chattopadhyaya commented on LUCENE-7745:
--

Hi Vikash,
I have reviewed the proposal. It is still extremely disorganized and it is not 
clear what your goals are and how you have split them up into tasks. It 
contains lots of copy paste of comments/statements from this JIRA or comments 
from the proposal itself. The level of details still seems inadequate to me.

I had proposed a possible way to structure your proposal (by splitting the 
three months into three different areas of focus, all of them I specified in 
the comments), but I don't see that you've done so. I asked you to find out, at 
least, what the default Similarity in Lucene is called (and to attempt to 
simulate the scoring for that on the GPU). It seems you have not done so.

At this point, I don't think much can be done (just 2 hours to go for 
submission). Wish you the best.

> Explore GPU acceleration
> 
>
> Key: LUCENE-7745
> URL: https://issues.apache.org/jira/browse/LUCENE-7745
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>  Labels: gsoc2017, mentor
>
> There are parts of Lucene that can potentially be speeded up if computations 
> were to be offloaded from CPU to the GPU(s). With commodity GPUs having as 
> high as 12GB of high bandwidth RAM, we might be able to leverage GPUs to 
> speed parts of Lucene (indexing, search).
> First that comes to mind is spatial filtering, which is traditionally known 
> to be a good candidate for GPU based speedup (esp. when complex polygons are 
> involved). In the past, Mike McCandless has mentioned that "both initial 
> indexing and merging are CPU/IO intensive, but they are very amenable to 
> soaking up the hardware's concurrency."
> I'm opening this issue as an exploratory task, suitable for a GSoC project. I 
> volunteer to mentor any GSoC student willing to work on this this summer.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7745) Explore GPU acceleration

2017-04-03 Thread vikash (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15953288#comment-15953288
 ] 

vikash commented on LUCENE-7745:


Hi Ishaan ,
I have changed the proposal according to your instructions, can you review it 
again?

> Explore GPU acceleration
> 
>
> Key: LUCENE-7745
> URL: https://issues.apache.org/jira/browse/LUCENE-7745
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>  Labels: gsoc2017, mentor
>
> There are parts of Lucene that can potentially be speeded up if computations 
> were to be offloaded from CPU to the GPU(s). With commodity GPUs having as 
> high as 12GB of high bandwidth RAM, we might be able to leverage GPUs to 
> speed parts of Lucene (indexing, search).
> First that comes to mind is spatial filtering, which is traditionally known 
> to be a good candidate for GPU based speedup (esp. when complex polygons are 
> involved). In the past, Mike McCandless has mentioned that "both initial 
> indexing and merging are CPU/IO intensive, but they are very amenable to 
> soaking up the hardware's concurrency."
> I'm opening this issue as an exploratory task, suitable for a GSoC project. I 
> volunteer to mentor any GSoC student willing to work on this this summer.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7745) Explore GPU acceleration

2017-04-02 Thread Ishan Chattopadhyaya (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15952849#comment-15952849
 ] 

Ishan Chattopadhyaya commented on LUCENE-7745:
--

I have left initial comments on your draft. Let me know if you want another 
round of review, perhaps after you've addressed the current comments.

> Explore GPU acceleration
> 
>
> Key: LUCENE-7745
> URL: https://issues.apache.org/jira/browse/LUCENE-7745
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>  Labels: gsoc2017, mentor
>
> There are parts of Lucene that can potentially be speeded up if computations 
> were to be offloaded from CPU to the GPU(s). With commodity GPUs having as 
> high as 12GB of high bandwidth RAM, we might be able to leverage GPUs to 
> speed parts of Lucene (indexing, search).
> First that comes to mind is spatial filtering, which is traditionally known 
> to be a good candidate for GPU based speedup (esp. when complex polygons are 
> involved). In the past, Mike McCandless has mentioned that "both initial 
> indexing and merging are CPU/IO intensive, but they are very amenable to 
> soaking up the hardware's concurrency."
> I'm opening this issue as an exploratory task, suitable for a GSoC project. I 
> volunteer to mentor any GSoC student willing to work on this this summer.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7745) Explore GPU acceleration

2017-04-02 Thread vikash (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15952838#comment-15952838
 ] 

vikash commented on LUCENE-7745:


Yeah I had already read the student manual and the deadline is 3rd April and 
its too close, in the preparation I had almost missed the deadline for 
application. OK so for the proposal my draft is here (you may comment on it and 
I will do the needful)

https://docs.google.com/document/d/1HGxU1ZudNdAboj0s9WKTWJk3anbZm86JY-abaflXoEI/edit?usp=sharing
 .

> Explore GPU acceleration
> 
>
> Key: LUCENE-7745
> URL: https://issues.apache.org/jira/browse/LUCENE-7745
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>  Labels: gsoc2017, mentor
>
> There are parts of Lucene that can potentially be speeded up if computations 
> were to be offloaded from CPU to the GPU(s). With commodity GPUs having as 
> high as 12GB of high bandwidth RAM, we might be able to leverage GPUs to 
> speed parts of Lucene (indexing, search).
> First that comes to mind is spatial filtering, which is traditionally known 
> to be a good candidate for GPU based speedup (esp. when complex polygons are 
> involved). In the past, Mike McCandless has mentioned that "both initial 
> indexing and merging are CPU/IO intensive, but they are very amenable to 
> soaking up the hardware's concurrency."
> I'm opening this issue as an exploratory task, suitable for a GSoC project. I 
> volunteer to mentor any GSoC student willing to work on this this summer.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7745) Explore GPU acceleration

2017-04-01 Thread Ishan Chattopadhyaya (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15952388#comment-15952388
 ] 

Ishan Chattopadhyaya commented on LUCENE-7745:
--

Hi Vikash,
I suggest you read the student manuals for GSoC.
Submit a proposal how you want to approach this project, including technical 
details (as much as possible) and detailed timelines.

Regarding the following:

{code}
1First, understand how BooleanScorer calls these similarity classes and 
does the scoring. There are unit tests in Lucene that can help you get there. 
This might help: https://wiki.apache.org/lucene-java/HowToContribute
2Write a standalone CUDA/OpenCL project that does the same processing on 
the GPU.
3Benchmark the speed of doing so on GPU vs. speed observed when doing the 
same through the BooleanScorer. Preferably, on a large resultset. Include time 
for copying results and scores in and out of the device memory from/to the main 
memory.
 4   Optimize step 2, if possible.
{code}

If you've already understood step 1, feel free to make a proposal on how you 
will use your GSoC coding time to achieve steps 2-4. Also, you can look at 
other stretch goals to be included in the coding time. I would consider that 
steps 2-4, if done properly and successfully, is itself a good GSoC 
contribution. And if these steps are done properly, then either Lucene 
integration can be proposed for the latter part of the coding phase (last 2-3 
weeks, I'd think), or exploratory work on other part of Lucene (apart from the 
BooleanScorer, e.g. spatial search filtering etc.) could be taken up. 

Time is running out, so kindly submit a proposal as soon as possible. You can 
submit a draft first, have one of us review it and then submit it as final 
after the review. If the deadline is too close, there might not be enough time 
for this round of review, and in such a case just submit the draft as final.

Also, remember a lot of the GPGPU coding is done on C, so 
familiarity/experience with that is a plus.

(Just a suggestion that makes sense to me, and feel free to ignore: bullet 
points work better than long paragraphs, even though the length of sentences 
can remain the same)

> Explore GPU acceleration
> 
>
> Key: LUCENE-7745
> URL: https://issues.apache.org/jira/browse/LUCENE-7745
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>  Labels: gsoc2017, mentor
>
> There are parts of Lucene that can potentially be speeded up if computations 
> were to be offloaded from CPU to the GPU(s). With commodity GPUs having as 
> high as 12GB of high bandwidth RAM, we might be able to leverage GPUs to 
> speed parts of Lucene (indexing, search).
> First that comes to mind is spatial filtering, which is traditionally known 
> to be a good candidate for GPU based speedup (esp. when complex polygons are 
> involved). In the past, Mike McCandless has mentioned that "both initial 
> indexing and merging are CPU/IO intensive, but they are very amenable to 
> soaking up the hardware's concurrency."
> I'm opening this issue as an exploratory task, suitable for a GSoC project. I 
> volunteer to mentor any GSoC student willing to work on this this summer.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7745) Explore GPU acceleration

2017-04-01 Thread vikash (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15952375#comment-15952375
 ] 

vikash commented on LUCENE-7745:


Hello all, 
I have been reading a lot about GPU working and GPU parallelization in 
particularly about General Purpose computing on Graphics Processing Units and 
have also looked into in detail the source code of the BooleanScorer.java file 
, its a nice thing and I am having no difficulty understanding its working 
since Java is my speciality so the job was quite fun . There are a few things 
that seem unclear to me but I am reading and experimenting so I will resolve 
them soon.  It is a nice idea to use gpu to perform the search and indexing 
operations on a document using the GPU and that would be faster using the GPU. 
And regarding the licencing issue, since we are generating code and as it was 
said above the code that we generate may not go to Lucene necessarily so 
assuming this happens then will licencing still be an issue if we use the 
libraries in our code? And as Uwe Schindler  said we may host the code on 
github and certainly it would not be a good idea to develop code for special 
hardware, but still we can give it a try and try to make it compatible with 
most of the hardwares. I dont mind if this code does not go to Lucene, but we 
can try to change lucene and make it better and I am preparing myself for it 
and things would stay on track with your kind mentorship .
So should I submit my proposal now or do I need to complete all the four steps 
that Ishaan told to do in the last comment and then submit my proposal? And 
which one of the ideas would you prefer to mentor me on that is which one do 
you think would be a better one to continue with? 

>Copy over and index lots of points and corresponding docids to the GPU as an 
>offline, one time operation. Then, given a query point, return top-n nearest 
>indexed points.
>Copy over and index lots of points and corresponding docids to the GPU as an 
>offline, one time operation. Then, given a polygon (complex shape), return all 
>points that lie inside the polygon.
>Benchmarking an aggregation over a DocValues field  and comparing the 
>corresponding performance when executed on the GPU. 
>Benchmarking the speed of calculations on GPU vs. speed observed when doing 
>the same through the BooleanScorer. Preferably, on a large result set with the 
>time for copying results and scores in and out of the device memory from/to 
>the main memory included?
-Vikash

> Explore GPU acceleration
> 
>
> Key: LUCENE-7745
> URL: https://issues.apache.org/jira/browse/LUCENE-7745
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>  Labels: gsoc2017, mentor
>
> There are parts of Lucene that can potentially be speeded up if computations 
> were to be offloaded from CPU to the GPU(s). With commodity GPUs having as 
> high as 12GB of high bandwidth RAM, we might be able to leverage GPUs to 
> speed parts of Lucene (indexing, search).
> First that comes to mind is spatial filtering, which is traditionally known 
> to be a good candidate for GPU based speedup (esp. when complex polygons are 
> involved). In the past, Mike McCandless has mentioned that "both initial 
> indexing and merging are CPU/IO intensive, but they are very amenable to 
> soaking up the hardware's concurrency."
> I'm opening this issue as an exploratory task, suitable for a GSoC project. I 
> volunteer to mentor any GSoC student willing to work on this this summer.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7745) Explore GPU acceleration

2017-03-28 Thread Ishan Chattopadhyaya (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15945637#comment-15945637
 ] 

Ishan Chattopadhyaya commented on LUCENE-7745:
--

bq. Java CUDA libraries exist and what their licenses
jCuda happens to be MIT, which is, afaik, compatible with Apache license.
http://www.jcuda.org/License.txt

> Explore GPU acceleration
> 
>
> Key: LUCENE-7745
> URL: https://issues.apache.org/jira/browse/LUCENE-7745
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>  Labels: gsoc2017, mentor
>
> There are parts of Lucene that can potentially be speeded up if computations 
> were to be offloaded from CPU to the GPU(s). With commodity GPUs having as 
> high as 12GB of high bandwidth RAM, we might be able to leverage GPUs to 
> speed parts of Lucene (indexing, search).
> First that comes to mind is spatial filtering, which is traditionally known 
> to be a good candidate for GPU based speedup (esp. when complex polygons are 
> involved). In the past, Mike McCandless has mentioned that "both initial 
> indexing and merging are CPU/IO intensive, but they are very amenable to 
> soaking up the hardware's concurrency."
> I'm opening this issue as an exploratory task, suitable for a GSoC project. I 
> volunteer to mentor any GSoC student willing to work on this this summer.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7745) Explore GPU acceleration

2017-03-28 Thread Ishan Chattopadhyaya (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15945632#comment-15945632
 ] 

Ishan Chattopadhyaya commented on LUCENE-7745:
--

Hi Vikash,

Regarding licensing issue:
The work done in this project would be exploratory. That code won't necessarily 
go into Lucene. When we are at a point where we see clear benefits from the 
work done here, we would then have to explore all aspects of productionizing it 
(including licensing).

Regarding next steps:
{quote}
BooleanScorer calls a lot of classes, e.g. the BM25 similarity or TF-IDF to do 
the calculation that could possibly be parallelized.
{quote}
# First, understand how BooleanScorer calls these similarity classes and does 
the scoring. There are unit tests in Lucene that can help you get there. This 
might help: https://wiki.apache.org/lucene-java/HowToContribute
# Write a standalone CUDA/OpenCL project that does the same processing on the 
GPU.
# Benchmark the speed of doing so on GPU vs. speed observed when doing the same 
through the BooleanScorer. Preferably, on a large resultset. Include time for 
copying results and scores in and out of the device memory from/to the main 
memory.
# Optimize step 2, if possible.

Once this is achieved (which in itself could be a sufficient GSoC project), one 
can have stretch goals to try out other parts of Lucene to optimize (e.g. 
spatial search).

> Explore GPU acceleration
> 
>
> Key: LUCENE-7745
> URL: https://issues.apache.org/jira/browse/LUCENE-7745
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>  Labels: gsoc2017, mentor
>
> There are parts of Lucene that can potentially be speeded up if computations 
> were to be offloaded from CPU to the GPU(s). With commodity GPUs having as 
> high as 12GB of high bandwidth RAM, we might be able to leverage GPUs to 
> speed parts of Lucene (indexing, search).
> First that comes to mind is spatial filtering, which is traditionally known 
> to be a good candidate for GPU based speedup (esp. when complex polygons are 
> involved). In the past, Mike McCandless has mentioned that "both initial 
> indexing and merging are CPU/IO intensive, but they are very amenable to 
> soaking up the hardware's concurrency."
> I'm opening this issue as an exploratory task, suitable for a GSoC project. I 
> volunteer to mentor any GSoC student willing to work on this this summer.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7745) Explore GPU acceleration

2017-03-28 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15945065#comment-15945065
 ] 

David Smiley commented on LUCENE-7745:
--

vikash: not all working code contributed to any open-source project is 
necessarily welcome.  Usually it is but sometimes project members or ASF rules 
insist on certain things for the perceived greater good.  In this case, I 
believe Uwe doesn't want Lucene to include anything that would only work with 
certain hardware or JVM vendors -- even if it was optional opt-in.  If 
hypothetically nobody had such concerns here, be aware that any 3rd party 
(non-ASF) libraries need to meet certain qualifications.  For example, *if* 
whatever Java CUDA library you find happens to be licensed as GPL, then it's 
incompatible with ASF run projects like this one.  That's a hypothetical; I 
have no idea what Java CUDA libraries exist and what their licenses are.  
Regardless... if you come up with something useful, it's probably not necessary 
that Lucene itself change, and as seen here we have some willingness to change 
Lucene (details TBD) if it enables people to use Lucene with CUDA.  Lucene has 
many extension points already.  Though I could imagine you might unfortunately 
need to copy/fork some long source files -- Uwe mentioned some.

Good luck.

> Explore GPU acceleration
> 
>
> Key: LUCENE-7745
> URL: https://issues.apache.org/jira/browse/LUCENE-7745
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>  Labels: gsoc2017, mentor
>
> There are parts of Lucene that can potentially be speeded up if computations 
> were to be offloaded from CPU to the GPU(s). With commodity GPUs having as 
> high as 12GB of high bandwidth RAM, we might be able to leverage GPUs to 
> speed parts of Lucene (indexing, search).
> First that comes to mind is spatial filtering, which is traditionally known 
> to be a good candidate for GPU based speedup (esp. when complex polygons are 
> involved). In the past, Mike McCandless has mentioned that "both initial 
> indexing and merging are CPU/IO intensive, but they are very amenable to 
> soaking up the hardware's concurrency."
> I'm opening this issue as an exploratory task, suitable for a GSoC project. I 
> volunteer to mentor any GSoC student willing to work on this this summer.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7745) Explore GPU acceleration

2017-03-28 Thread vikash (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15944884#comment-15944884
 ] 

vikash commented on LUCENE-7745:


Hi all, I have been reading about GPU acceleration and in particular to be 
precise about GPU accelerated computing I find this project very interesting 
and so can anyone give me further lead what is to be done now? I mean the ideas 
that Ishaan suggested are pretty good but I am still not able to understand 
that what Mr David means by  (a) could whatever comes of this actually be 
contributed to Lucene itself, why can the outcome of this project not be 
contributed to Lucene?

> Explore GPU acceleration
> 
>
> Key: LUCENE-7745
> URL: https://issues.apache.org/jira/browse/LUCENE-7745
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>  Labels: gsoc2017, mentor
>
> There are parts of Lucene that can potentially be speeded up if computations 
> were to be offloaded from CPU to the GPU(s). With commodity GPUs having as 
> high as 12GB of high bandwidth RAM, we might be able to leverage GPUs to 
> speed parts of Lucene (indexing, search).
> First that comes to mind is spatial filtering, which is traditionally known 
> to be a good candidate for GPU based speedup (esp. when complex polygons are 
> involved). In the past, Mike McCandless has mentioned that "both initial 
> indexing and merging are CPU/IO intensive, but they are very amenable to 
> soaking up the hardware's concurrency."
> I'm opening this issue as an exploratory task, suitable for a GSoC project. I 
> volunteer to mentor any GSoC student willing to work on this this summer.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7745) Explore GPU acceleration

2017-03-20 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15932714#comment-15932714
 ] 

Uwe Schindler commented on LUCENE-7745:
---

Hi,
in General, including CUDA into Lucene may be a good idea, but I see no real 
possibility to do this inside Lucene Core or any other module. My idea would be 
to add some abstraction to the relevant parts of Lucene and make it easier to 
"plug in" different implementations. Then this code could also be hosted 
outside Lucene (if Licenses is a problem) anywhere on Github.

We still should have the following in our head: Mike's example looks "simple" 
as a quick test if we see gains, but making the whole thing ready for commit or 
bundling in any project in/outside Lucene is a whole different story. Currently 
BooleanScorer calls a lot of classes, e.g. the BM25 similarity or TF-IDF to do 
the calculation that could possibly be parallelized. But for moving all this to 
CUDA, you have to add "plugin points" all there and change APIs completely. It 
is also hard to test, because none of our Jenkins servers has a GPU! Also for 
uses of Lucene, this could be a huge problem, if we add native stuff into 
Lucene that they may never use. Because of that it MUST BE SEPARATED from 
Lucene core. Completely...

IMHO, I'd create a full new search engine like CLucene in C code if I would 
solely focus on GPU parallelization. The current iterator based approaches are 
not easy to transform or plug into CUDA...

For the GSoc project, we should make sure to the GSoc student that this is just 
a project to "explore" GPU acceleration: if it brings any performance - I doubt 
that, because the call overhead between Java and CUDA is way too high - in 
contrast to Postgres where all in plain C/C++. The results would then be used 
to plan and investigate ways how to include that into Lucene as "plugin points" 
(e.g., as SPI modules).

> Explore GPU acceleration
> 
>
> Key: LUCENE-7745
> URL: https://issues.apache.org/jira/browse/LUCENE-7745
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>  Labels: gsoc2017, mentor
>
> There are parts of Lucene that can potentially be speeded up if computations 
> were to be offloaded from CPU to the GPU(s). With commodity GPUs having as 
> high as 12GB of high bandwidth RAM, we might be able to leverage GPUs to 
> speed parts of Lucene (indexing, search).
> First that comes to mind is spatial filtering, which is traditionally known 
> to be a good candidate for GPU based speedup (esp. when complex polygons are 
> involved). In the past, Mike McCandless has mentioned that "both initial 
> indexing and merging are CPU/IO intensive, but they are very amenable to 
> soaking up the hardware's concurrency."
> I'm opening this issue as an exploratory task, suitable for a GSoC project. I 
> volunteer to mentor any GSoC student willing to work on this this summer.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7745) Explore GPU acceleration

2017-03-20 Thread Ishan Chattopadhyaya (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15932604#comment-15932604
 ] 

Ishan Chattopadhyaya commented on LUCENE-7745:
--

[~dsmiley], that is a very important question. Afaik, there is no Apache 
compatible GPGPU framework. Both OpenCL and CUDA are likely incompatible with 
Apache (I am not fully sure). I see that jCUDA is MIT license, which is a 
wrapper around the native libraries.

If there are benefits to using GPGPU processing, my thought is that we can 
ensure all necessary plumbing in our codebase in order to offload processing to 
some plugin, whereby the user can plugin the exact GPU kernels from outside the 
Lucene distribution (if those kernels also violate any licensing restrictions 
we have). If there are clear benefits in speeding things up using a GPU, it 
would not be, *for the end-user*, the end of the world if the code comes 
outside Apache distribution.

bq.  If (a) is a "no", we need to be honest up front with the contributor.
That is a good point, and we can document this clearly.


> Explore GPU acceleration
> 
>
> Key: LUCENE-7745
> URL: https://issues.apache.org/jira/browse/LUCENE-7745
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>  Labels: gsoc2017, mentor
>
> There are parts of Lucene that can potentially be speeded up if computations 
> were to be offloaded from CPU to the GPU(s). With commodity GPUs having as 
> high as 12GB of high bandwidth RAM, we might be able to leverage GPUs to 
> speed parts of Lucene (indexing, search).
> First that comes to mind is spatial filtering, which is traditionally known 
> to be a good candidate for GPU based speedup (esp. when complex polygons are 
> involved). In the past, Mike McCandless has mentioned that "both initial 
> indexing and merging are CPU/IO intensive, but they are very amenable to 
> soaking up the hardware's concurrency."
> I'm opening this issue as an exploratory task, suitable for a GSoC project. I 
> volunteer to mentor any GSoC student willing to work on this this summer.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7745) Explore GPU acceleration

2017-03-20 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15932589#comment-15932589
 ] 

Michael McCandless commented on LUCENE-7745:


Maybe even the basic hit scoring that e.g. BooleanScorer does with disjunction 
of high frequency terms, would be amenable to GPU acceleration?  Today 
BooleanScorer processes a whole window of hits at once, doing fairly simple 
math (the Similarity methods) on each.

> Explore GPU acceleration
> 
>
> Key: LUCENE-7745
> URL: https://issues.apache.org/jira/browse/LUCENE-7745
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>  Labels: gsoc2017, mentor
>
> There are parts of Lucene that can potentially be speeded up if computations 
> were to be offloaded from CPU to the GPU(s). With commodity GPUs having as 
> high as 12GB of high bandwidth RAM, we might be able to leverage GPUs to 
> speed parts of Lucene (indexing, search).
> First that comes to mind is spatial filtering, which is traditionally known 
> to be a good candidate for GPU based speedup (esp. when complex polygons are 
> involved). In the past, Mike McCandless has mentioned that "both initial 
> indexing and merging are CPU/IO intensive, but they are very amenable to 
> soaking up the hardware's concurrency."
> I'm opening this issue as an exploratory task, suitable for a GSoC project. I 
> volunteer to mentor any GSoC student willing to work on this this summer.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7745) Explore GPU acceleration

2017-03-20 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15932568#comment-15932568
 ] 

David Smiley commented on LUCENE-7745:
--

I have a question to us all.  (a) could whatever comes of this actually be 
contributed to Lucene itself given the likelihood of requiring native O.S. 
bindings (lets presume in spatial-extras as it seems this is the only module 
that can have an external dependency), and (b) does that matter for GSOC or to 
the expectations of the contributor?  If (a) is a "no", we need to be honest up 
front with the contributor.  I know in the past Solr has been denied off-heap 
filters that would have required a un-pure Java approach.  A native binding 
would be another degree of un-purity :-)

> Explore GPU acceleration
> 
>
> Key: LUCENE-7745
> URL: https://issues.apache.org/jira/browse/LUCENE-7745
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>  Labels: gsoc2017, mentor
>
> There are parts of Lucene that can potentially be speeded up if computations 
> were to be offloaded from CPU to the GPU(s). With commodity GPUs having as 
> high as 12GB of high bandwidth RAM, we might be able to leverage GPUs to 
> speed parts of Lucene (indexing, search).
> First that comes to mind is spatial filtering, which is traditionally known 
> to be a good candidate for GPU based speedup (esp. when complex polygons are 
> involved). In the past, Mike McCandless has mentioned that "both initial 
> indexing and merging are CPU/IO intensive, but they are very amenable to 
> soaking up the hardware's concurrency."
> I'm opening this issue as an exploratory task, suitable for a GSoC project. I 
> volunteer to mentor any GSoC student willing to work on this this summer.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7745) Explore GPU acceleration

2017-03-20 Thread Ishan Chattopadhyaya (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15932556#comment-15932556
 ] 

Ishan Chattopadhyaya commented on LUCENE-7745:
--

Another experiment that, I think, is worth trying out:
* Benchmarking an aggregation over a DocValues field (e.g. using sqrt(), 
haversine distance etc.), and comparing the corresponding performance when 
executed on the GPU. This could potentially speed up scoring of results. 

For reference, Postgresql seems to have experienced speedup in some areas (esp. 
aggregations over column oriented fields): 
https://www.slideshare.net/kaigai/gpgpu-accelerates-postgresql

> Explore GPU acceleration
> 
>
> Key: LUCENE-7745
> URL: https://issues.apache.org/jira/browse/LUCENE-7745
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>  Labels: gsoc2017, mentor
>
> There are parts of Lucene that can potentially be speeded up if computations 
> were to be offloaded from CPU to the GPU(s). With commodity GPUs having as 
> high as 12GB of high bandwidth RAM, we might be able to leverage GPUs to 
> speed parts of Lucene (indexing, search).
> First that comes to mind is spatial filtering, which is traditionally known 
> to be a good candidate for GPU based speedup (esp. when complex polygons are 
> involved). In the past, Mike McCandless has mentioned that "both initial 
> indexing and merging are CPU/IO intensive, but they are very amenable to 
> soaking up the hardware's concurrency."
> I'm opening this issue as an exploratory task, suitable for a GSoC project. I 
> volunteer to mentor any GSoC student willing to work on this this summer.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7745) Explore GPU acceleration

2017-03-19 Thread Ishan Chattopadhyaya (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15931975#comment-15931975
 ] 

Ishan Chattopadhyaya commented on LUCENE-7745:
--

Here are some ideas on things to start out with:
# Copy over and index lots of points and corresponding docids to the GPU as an 
offline, one time operation. Then, given a query point, return top-n nearest 
indexed points.
# Copy over and index lots of points and corresponding docids to the GPU as an 
offline, one time operation. Then, given a polygon (complex shape), return all 
points that lie inside the polygon.

In both the cases, compare performance against existing Lucene spatial search. 
One would need to choose the most suitable algorithm for doing these as 
efficiently as possible. Any GPGPU API can be used for now (OpenCL, CUDA) for 
initial exploration.

[~dsmiley], [~kwri...@metacarta.com], [~nknize], [~mikemccand], given your 
depth and expertise in this area, do you have any suggestions? Any other area 
of Lucene that comes to mind which should be easiest to start with, in terms of 
exploring GPU based parallelization?

> Explore GPU acceleration
> 
>
> Key: LUCENE-7745
> URL: https://issues.apache.org/jira/browse/LUCENE-7745
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>  Labels: gsoc2017, mentor
>
> There are parts of Lucene that can potentially be speeded up if computations 
> were to be offloaded from CPU to the GPU(s). With commodity GPUs having as 
> high as 12GB of high bandwidth RAM, we might be able to leverage GPUs to 
> speed parts of Lucene (indexing, search).
> First that comes to mind is spatial filtering, which is traditionally known 
> to be a good candidate for GPU based speedup (esp. when complex polygons are 
> involved). In the past, Mike McCandless has mentioned that "both initial 
> indexing and merging are CPU/IO intensive, but they are very amenable to 
> soaking up the hardware's concurrency."
> I'm opening this issue as an exploratory task, suitable for a GSoC project. I 
> volunteer to mentor any GSoC student willing to work on this this summer.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-7745) Explore GPU acceleration

2017-03-16 Thread vikash (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-7745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15928714#comment-15928714
 ] 

vikash commented on LUCENE-7745:


Hi I am willing to work on this.

> Explore GPU acceleration
> 
>
> Key: LUCENE-7745
> URL: https://issues.apache.org/jira/browse/LUCENE-7745
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ishan Chattopadhyaya
>  Labels: gsoc2017, mentor
>
> There are parts of Lucene that can potentially be speeded up if computations 
> were to be offloaded from CPU to the GPU(s). With commodity GPUs having as 
> high as 12GB of high bandwidth RAM, we might be able to leverage GPUs to 
> speed parts of Lucene (indexing, search).
> First that comes to mind is spatial filtering, which is traditionally known 
> to be a good candidate for GPU based speedup (esp. when complex polygons are 
> involved). In the past, Mike McCandless has mentioned that "both initial 
> indexing and merging are CPU/IO intensive, but they are very amenable to 
> soaking up the hardware's concurrency."
> I'm opening this issue as an exploratory task, suitable for a GSoC project. I 
> volunteer to mentor any GSoC student willing to work on this this summer.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org