[ 
https://issues.apache.org/jira/browse/SOLR-11838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16343656#comment-16343656
 ] 

Jeroen Steggink commented on SOLR-11838:
----------------------------------------

As a start, I think applying models for LTR or classifying documents/fields 
when indexing would be most useful.

One thing we shouldn't underestimate is data structures for Neural Networks. 
Depending on the network structure a model may depend on a specific data 
structure. For example, timeseries-vectors are very different from other 
vectors. Are we doing just bag-of-words or do we keep the order of words? How 
many fields would your like as input? How many inputs can models have 
(preferably ComputationGraphs, as they are more flexible).

Furthermore, we should think about what is actually going to work. Having 
one-hot encoding for all terms in an index could be problematic. There is 
already a logistic regression implementation which works great for simple 
classification. If we're going to use DL4J it should add something more than 
Solr already offers.

Maybe we can think of a few specific use cases to make a prototype for?

 

I think we can make a DataVec record reader for Solr (@[~kwatters]). But I 
guess this is something we can add to DataVec itself, instead of adding this to 
Solr. An alternative could be to use Solr's Streaming API to return data in a 
format which is efficient and could be directly used by DataVec.

Another thing I'd like to mention is dependencies. Instead of relying on DL4J 
specifically, we could think about abstracting data input and output for 
machine learning and applying models in general. As a DL4J user I'm not very 
interested in running it on a Solr server. I have dedicated servers running 
DL4J models which I serve using REST APIs. The reason is that I have servers 
with GPUs and lot's of RAM dedicated for this type of process. Solr on the 
other hand can be very demanding in a different way.

 

> explore supporting Deeplearning4j NeuralNetwork models
> ------------------------------------------------------
>
>                 Key: SOLR-11838
>                 URL: https://issues.apache.org/jira/browse/SOLR-11838
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Christine Poerschke
>            Priority: Major
>         Attachments: SOLR-11838.patch
>
>
> [~yuyano] wrote in SOLR-11597:
> bq. ... If we think to apply this to more complex neural networks in the 
> future, we will need to support layers ...
> [~malcorn_redhat] wrote in SOLR-11597:
> bq. ... In my opinion, if this is a route Solr eventually wants to go, I 
> think a better strategy would be to just add a dependency on 
> [Deeplearning4j|https://deeplearning4j.org/] ...
> Creating this ticket for the idea to be explored further (if anyone is 
> interested in exploring it), complimentary to and independent of the 
> SOLR-11597 RankNet related effort.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to