subject:"Using the Maven Indexer"

Re: Using the Maven Indexer

2015-01-12 Thread Eduard Moraru

Hi,

I`ve just created 2 pull requests:
https://github.com/apache/maven-indexer/pull/10 (indexer-cli not working as
expected)
https://github.com/apache/maven-indexer/pull/11 (what I mentioned about the
ArtifactInfo's repositoryID being null)

Hope you have the time to have a look.

I`m particularly interested in #11. If it is accepted, I might continue
exploring option 1), as detailed in my previous post, since I am currently
blocked by it, as detailed in 1.1).

Thanks,
Eduard

On Mon, Dec 8, 2014 at 10:26 PM, Tamas Cservenak ta...@cservenak.net
wrote:

 Hi Eduard,

 for additional information see:
 http://jira.codehaus.org/browse/MINDEXER-81

 Currently, the ArtifactInfo is hardwired, is not extensible.

 Re available index for Central, it’s not the “minimal usable”
 the decision driver, but the SIZE of the index download instead.
 We were experimenting with different creators, but the bandwidth
 it took off (if you compared it to artifact downloads) was really huge.

 As almost everyone uses MRMs, and they tend to “improve” the
 basic GAV index Central publishes (ie. once Nexus caches a
 JAR file, it will “improve” the index with Classnames in the JAR too,
 something Central does not publish).

 artifactInfo#repoId should not return null, if asked via context.
 If it does, there is a bug lurking somewhere.

 Currently the “extra info” path is viable, but that would create a lot
 of cruft around indexer classes…..





 --
 Thanks,
 ~t~

 On 8 Dec 2014 at 17:15:08, Eduard Moraru (enygma2...@gmail.com) wrote:

 Hi,

 I have a new challenge for your maven-indexer expertise :)

 What about adding additional information to the local index? I see the
 default indexers (min, etc..) produce really minimal information. The
 problem is that everybody is using these default indexers and all the
 available indexes (maven central, etc) offer very little information that
 you can actually use to make the index useful in an application outside of
 really basic name, description, group, artifact, etc queries.

 For instance, if I would want to add author information (to query by
 author) or dependency information (to perform compatibility checks against
 an installation/group of installed artifacts) or anything else for the
 matter, what would be the recommended approach?

 From what I have currently researched, I see 2 options:

 1) Have a custom IndexCreator that uses the updateDocument(ArtifactInfo
 artifactInfo, Document document) method to fetch (HTTP GET) get the
 pom.xml
 by using information from the artifactInfo object (repository, groupID,
 artifactId, classifier, version, etc.) so that the resulting document
 contains the extra information. It seems that IndexCreators are used a lot
 more than they are advertised in the descriptions, not only for indexing
 new items, but also when converting between ArtifactInfo objects and
 Lucene
 Documents.

 1.1) I had initially started going on this pat, but then I realized that
 the artifactInfo that I receive in this method does not provide basic
 information (i.e. artifactInfo.getRepository() always returns null ;-( )
 It
 would be awesome if information like context and/or repository would be
 added to the artifactInfo object (maybe in
 IndexUtils.constructArtifactInfo( Document doc, IndexingContext context )
 ?), the same way the ArtifactInfo.UINFO and ArtifactInfo.LAST_MODIFIED
 fields are handled specially and explicitly added to a new Document that
 is
 passed to the IndexCreators.

 2) Handle this separately from maven indexer's work, and do it right after
 index/update operations, i.e. let maven-indexer update the local index
 with
 information from the remote index and then start manipulating the
 underlying Lucene index by adding information retrieved from the network
 (HTTP GET) from the remote repositoy's POM files. In a rough pseudocode,
 something like:

 indexer.update(repoX);
 indexer.getAllIndexedArtifacts().forEach(artifact -
 var extraData = getExtraData(repoX, artifact);
 var indexer.getLuceneIndex().add(artifact, extraData)
 );

 3) Any other suggestions?

 My ultimate goal is (besides basic name/description queries) to be able to
 perform compatibility queries on artifacts coming from multiple
 repositories, so I need to find a solution to add this missing infrmation
 (artifact dependencies, and maybe more).

 As previously, your help and suggestions are most welcomed.

 Thanks,
 Eduard

 On Wed, Nov 26, 2014 at 1:22 PM, Eduard Moraru enygma2...@gmail.com
 wrote:

 
 
  On Tue, Nov 25, 2014 at 12:22 PM, Tamas Cservenak ta...@cservenak.net
  wrote:
 
  Hi there,
 
  1) yes, indexing context retains the artefact “origin” (ie. repo), so
 you
  need context per origin. Sadly, the 1 index per context is current
  limitation of maven indexer, but this problem is known. Created
  http://jira.codehaus.org/browse/MINDEXER-93
 
  2) Yes, merged context is basically delegating to member contexts.
 under
  the hud, it uses Lucene’s MultiReader to actually

Re: Using the Maven Indexer

2014-12-08 Thread Eduard Moraru

Hi,

I have a new challenge for your maven-indexer expertise :)

What about adding additional information to the local index? I see the
default indexers (min, etc..) produce really minimal information. The
problem is that everybody is using these default indexers and all the
available indexes (maven central, etc) offer very little information that
you can actually use to make the index useful in an application outside of
really basic name, description, group, artifact, etc queries.

For instance, if I would want to add author information (to query by
author) or dependency information (to perform compatibility checks against
an installation/group of installed artifacts) or anything else for the
matter, what would be the recommended approach?

From what I have currently researched, I see 2 options:

1) Have a custom IndexCreator that uses the updateDocument(ArtifactInfo
artifactInfo, Document document) method to fetch (HTTP GET) get the pom.xml
by using information from the artifactInfo object (repository, groupID,
artifactId, classifier, version, etc.) so that the resulting document
contains the extra information. It seems that IndexCreators are used a lot
more than they are advertised in the descriptions, not only for indexing
new items, but also when converting between ArtifactInfo objects and Lucene
Documents.

1.1) I had initially started going on this pat, but then I realized that
the artifactInfo that I receive in this method does not provide basic
information (i.e. artifactInfo.getRepository() always returns null ;-( ) It
would be awesome if information like context and/or repository would be
added to the artifactInfo object (maybe in
IndexUtils.constructArtifactInfo( Document doc, IndexingContext context )
?), the same way the ArtifactInfo.UINFO and ArtifactInfo.LAST_MODIFIED
fields are handled specially and explicitly added to a new Document that is
passed to the IndexCreators.

2) Handle this separately from maven indexer's work, and do it right after
index/update operations, i.e. let maven-indexer update the local index with
information from the remote index and then start manipulating the
underlying Lucene index by adding information retrieved from the network
(HTTP GET) from the remote repositoy's POM files. In a rough pseudocode,
something like:

indexer.update(repoX);
indexer.getAllIndexedArtifacts().forEach(artifact -
  var extraData = getExtraData(repoX, artifact);
  var indexer.getLuceneIndex().add(artifact, extraData)
);

3) Any other suggestions?

My ultimate goal is (besides basic name/description queries) to be able to
perform compatibility queries on artifacts coming from multiple
repositories, so I need to find a solution to add this missing infrmation
(artifact dependencies, and maybe more).

As previously, your help and suggestions are most welcomed.

Thanks,
Eduard

On Wed, Nov 26, 2014 at 1:22 PM, Eduard Moraru enygma2...@gmail.com wrote:



 On Tue, Nov 25, 2014 at 12:22 PM, Tamas Cservenak ta...@cservenak.net
 wrote:

 Hi there,

 1) yes, indexing context retains the artefact “origin” (ie. repo), so you
 need context per origin. Sadly, the 1 index per context is current
 limitation of maven indexer, but this problem is known. Created
 http://jira.codehaus.org/browse/MINDEXER-93

 2) Yes, merged context is basically delegating to member contexts. under
 the hud, it uses Lucene’s MultiReader to actually perform the search.


 I have solved the search problem for now by using the SearchEngine
 component and issuing an IteratorSearchRequest on a list of
 IndexingContexts to get paginated results. Will have to see how that works
 on the long run.

 Thanks,
 Eduard


 Re ranging, there are already issues (or problem spread across multiple
 issues), most notably this one
 http://jira.codehaus.org/browse/MINDEXER-8

 3) I think yes. Currently, indexer is being transitioned from Plexus to
 JSR330, and as you see in examples, it should work with any container
 supporting it. re “manually wiring”, in latest releases you might be able
 to do it, but in older ones probably not, as Plexus supported field
 injection only, and some of those member was not exposed via getter/setter.
 See
 http://jira.codehaus.org/browse/MINDEXER-80


 --
 Thanks,
 ~t~

 On 21 Nov 2014 at 18:08:26, Eduard Moraru (enygma2...@gmail.com) wrote:

 Hi,

 I have recently started playing with the maven indexer [1], following the
 examples [2], and I have some questions (since AFAIS, documentation is
 practically unexistent on the matter):

 1) From what I can understand, you need an IndexingContext for each
 repository you plan to index. This makes you end up with n lucene indexes,
 one for each repository. Is there any way that I could have just 1 lucene
 index, with all my repositories indexed in the same place? If the main
 purpose is searchig, why scatter the indexed information across n indexes
 and make the whole process dificult? Maybe I`m missing something.

 2) On the same line as the first question, when it

Re: Using the Maven Indexer

2014-12-08 Thread Tamas Cservenak

Hi Eduard,

for additional information see:
http://jira.codehaus.org/browse/MINDEXER-81

Currently, the ArtifactInfo is hardwired, is not extensible.

Re available index for Central, it’s not the “minimal usable” 
the decision driver, but the SIZE of the index download instead.
We were experimenting with different creators, but the bandwidth
it took off (if you compared it to artifact downloads) was really huge.

As almost everyone uses MRMs, and they tend to “improve” the
basic GAV index Central publishes (ie. once Nexus caches a 
JAR file, it will “improve” the index with Classnames in the JAR too,
something Central does not publish).

artifactInfo#repoId should not return null, if asked via context.
If it does, there is a bug lurking somewhere.

Currently the “extra info” path is viable, but that would create a lot
of cruft around indexer classes…..





-- 
Thanks,
~t~

On 8 Dec 2014 at 17:15:08, Eduard Moraru (enygma2...@gmail.com) wrote:

Hi,  

I have a new challenge for your maven-indexer expertise :)  

What about adding additional information to the local index? I see the  
default indexers (min, etc..) produce really minimal information. The  
problem is that everybody is using these default indexers and all the  
available indexes (maven central, etc) offer very little information that  
you can actually use to make the index useful in an application outside of  
really basic name, description, group, artifact, etc queries.  

For instance, if I would want to add author information (to query by  
author) or dependency information (to perform compatibility checks against  
an installation/group of installed artifacts) or anything else for the  
matter, what would be the recommended approach?  

From what I have currently researched, I see 2 options:  

1) Have a custom IndexCreator that uses the updateDocument(ArtifactInfo  
artifactInfo, Document document) method to fetch (HTTP GET) get the pom.xml  
by using information from the artifactInfo object (repository, groupID,  
artifactId, classifier, version, etc.) so that the resulting document  
contains the extra information. It seems that IndexCreators are used a lot  
more than they are advertised in the descriptions, not only for indexing  
new items, but also when converting between ArtifactInfo objects and Lucene  
Documents.  

1.1) I had initially started going on this pat, but then I realized that  
the artifactInfo that I receive in this method does not provide basic  
information (i.e. artifactInfo.getRepository() always returns null ;-( ) It  
would be awesome if information like context and/or repository would be  
added to the artifactInfo object (maybe in  
IndexUtils.constructArtifactInfo( Document doc, IndexingContext context )  
?), the same way the ArtifactInfo.UINFO and ArtifactInfo.LAST_MODIFIED  
fields are handled specially and explicitly added to a new Document that is  
passed to the IndexCreators.  

2) Handle this separately from maven indexer's work, and do it right after  
index/update operations, i.e. let maven-indexer update the local index with  
information from the remote index and then start manipulating the  
underlying Lucene index by adding information retrieved from the network  
(HTTP GET) from the remote repositoy's POM files. In a rough pseudocode,  
something like:  

indexer.update(repoX);  
indexer.getAllIndexedArtifacts().forEach(artifact -  
var extraData = getExtraData(repoX, artifact);  
var indexer.getLuceneIndex().add(artifact, extraData)  
);  

3) Any other suggestions?  

My ultimate goal is (besides basic name/description queries) to be able to  
perform compatibility queries on artifacts coming from multiple  
repositories, so I need to find a solution to add this missing infrmation  
(artifact dependencies, and maybe more).  

As previously, your help and suggestions are most welcomed.  

Thanks,  
Eduard  

On Wed, Nov 26, 2014 at 1:22 PM, Eduard Moraru enygma2...@gmail.com wrote:  

  
  
 On Tue, Nov 25, 2014 at 12:22 PM, Tamas Cservenak ta...@cservenak.net  
 wrote:  
  
 Hi there,  
  
 1) yes, indexing context retains the artefact “origin” (ie. repo), so you  
 need context per origin. Sadly, the 1 index per context is current  
 limitation of maven indexer, but this problem is known. Created  
 http://jira.codehaus.org/browse/MINDEXER-93  
  
 2) Yes, merged context is basically delegating to member contexts. under  
 the hud, it uses Lucene’s MultiReader to actually perform the search.  
  
  
 I have solved the search problem for now by using the SearchEngine  
 component and issuing an IteratorSearchRequest on a list of  
 IndexingContexts to get paginated results. Will have to see how that works  
 on the long run.  
  
 Thanks,  
 Eduard  
  
  
 Re ranging, there are already issues (or problem spread across multiple  
 issues), most notably this one  
 http://jira.codehaus.org/browse/MINDEXER-8  
  
 3) I think yes. Currently, indexer is being transitioned from Plexus to

Re: Using the Maven Indexer

2014-11-26 Thread Eduard Moraru

On Tue, Nov 25, 2014 at 12:22 PM, Tamas Cservenak ta...@cservenak.net
wrote:

Hi there,

1) yes, indexing context retains the artefact “origin” (ie. repo), so you
need context per origin. Sadly, the 1 index per context is current
limitation of maven indexer, but this problem is known. Created
http://jira.codehaus.org/browse/MINDEXER-93

2) Yes, merged context is basically delegating to member contexts. under
the hud, it uses Lucene’s MultiReader to actually perform the search.

I have solved the search problem for now by using the SearchEngine
component and issuing an IteratorSearchRequest on a list of
IndexingContexts to get paginated results. Will have to see how that works
on the long run.

Thanks,
Eduard

Re ranging, there are already issues (or problem spread across multiple
issues), most notably this one
http://jira.codehaus.org/browse/MINDEXER-8

3) I think yes. Currently, indexer is being transitioned from Plexus to
JSR330, and as you see in examples, it should work with any container
supporting it. re “manually wiring”, in latest releases you might be able
to do it, but in older ones probably not, as Plexus supported field
injection only, and some of those member was not exposed via getter/setter.
See
http://jira.codehaus.org/browse/MINDEXER-80

--
Thanks,
~t~

On 21 Nov 2014 at 18:08:26, Eduard Moraru (enygma2...@gmail.com) wrote:

Hi,

I have recently started playing with the maven indexer [1], following the
examples [2], and I have some questions (since AFAIS, documentation is
practically unexistent on the matter):

1) From what I can understand, you need an IndexingContext for each
repository you plan to index. This makes you end up with n lucene indexes,
one for each repository. Is there any way that I could have just 1 lucene
index, with all my repositories indexed in the same place? If the main
purpose is searchig, why scatter the indexed information across n indexes
and make the whole process dificult? Maybe I`m missing something.

2) On the same line as the first question, when it comes to searching, it
seems that I can use a MergedIndexingContext to perform a search on
multiple (all) indexed repositories (IndexingContexts). How does this merge
the search results? I assume it takes each lucene index and queries it
individually, but this probably means that the lucene scores of these
merged results are completely messed up and ureliable, right?
Any suggestions on how to properly perform search over multiple indexed
repositories?

3) About the Plexus Container: Am I forced to initialize and use one, or
can I/should manually instantiate the default implementations and use them
instead?

I`ll probably come up with more questions along the way, hope someone will
find the time to guide me on the right path.

Thanks,
Eduard

--
[1] https://github.com/apache/maven-indexer/
[2]

https://github.com/apache/maven-indexer/tree/master/indexer-examples/indexer-examples-basic

Re: Using the Maven Indexer

2014-11-25 Thread Tamas Cservenak

Hi there,

1) yes, indexing context retains the artefact “origin” (ie. repo), so you need
context per origin. Sadly, the 1 index per context is current limitation of
maven indexer, but this problem is known. Created
http://jira.codehaus.org/browse/MINDEXER-93

2) Yes, merged context is basically delegating to member contexts. under the
hud, it uses Lucene’s MultiReader to actually perform the search.

Re ranging, there are already issues (or problem spread across multiple
issues), most notably this one
http://jira.codehaus.org/browse/MINDEXER-8

3) I think yes. Currently, indexer is being transitioned from Plexus to JSR330,
and as you see in examples, it should work with any container supporting it. re
“manually wiring”, in latest releases you might be able to do it, but in older
ones probably not, as Plexus supported field injection only, and some of those
member was not exposed via getter/setter.
See
http://jira.codehaus.org/browse/MINDEXER-80

--
Thanks,
~t~

On 21 Nov 2014 at 18:08:26, Eduard Moraru (enygma2...@gmail.com) wrote:

Hi,

I have recently started playing with the maven indexer [1], following the
examples [2], and I have some questions (since AFAIS, documentation is
practically unexistent on the matter):

3) About the Plexus Container: Am I forced to initialize and use one, or
can I/should manually instantiate the default implementations and use them
instead?

I`ll probably come up with more questions along the way, hope someone will
find the time to guide me on the right path.

Thanks,
Eduard

--
[1] https://github.com/apache/maven-indexer/
[2]
https://github.com/apache/maven-indexer/tree/master/indexer-examples/indexer-examples-basic

Re: Using the Maven Indexer

2014-11-25 Thread Tamas Cservenak

Maven Indexer requires FS access, hence, you can index only local repositories.
To have indexing context of remote repository, you must ensure that remote
repository publishes the index. If remote does not publish it, you have no
other choice than nag the remote repository owner to publish it.

You have working examples in repo that shows how to get and keep fresh Central
index (that is a remote index and does publish index).

--
Thanks,
~t~

On 24 Nov 2014 at 12:46:37, Eduard Moraru (enygma2...@gmail.com) wrote:

Hi,

I have a new question: How can I index a remote repository? All the
examples I have found and even the NexusIndexerCli seem to be focused about
indexing *only* local repositories and then publishing this index for
consumption.

I do not want to do that. Is there any way I can pass an URL to an
IndexPackingRequest instead of a (local) directory? Basically, my use case
is:

1. Take a maven URL
2. If it has an index already created, use it through an
IndexUpdatingRequest
3. If not, create a local index (IndexPackingRequest?)
4. In both cases, I then need to periodically update/synchronize my local
index of the remote repository.

Any help is deeply appreciated.

Thanks,
Eduard

On Fri, Nov 21, 2014 at 7:07 PM, Eduard Moraru enygma2...@gmail.com wrote:

Hi,

I have recently started playing with the maven indexer [1], following the
examples [2], and I have some questions (since AFAIS, documentation is
practically unexistent on the matter):

3) About the Plexus Container: Am I forced to initialize and use one, or
can I/should manually instantiate the default implementations and use them
instead?

I`ll probably come up with more questions along the way, hope someone will
find the time to guide me on the right path.

Thanks,
Eduard

--
[1] https://github.com/apache/maven-indexer/
[2]
https://github.com/apache/maven-indexer/tree/master/indexer-examples/indexer-examples-basic

Re: Using the Maven Indexer

2014-11-24 Thread Eduard Moraru

Hi,

I do not want to do that. Is there any way I can pass an URL to an
IndexPackingRequest instead of a (local) directory? Basically, my use case
is:

Any help is deeply appreciated.

Thanks,
Eduard

On Fri, Nov 21, 2014 at 7:07 PM, Eduard Moraru enygma2...@gmail.com wrote:

Hi,

I have recently started playing with the maven indexer [1], following the
examples [2], and I have some questions (since AFAIS, documentation is
practically unexistent on the matter):

3) About the Plexus Container: Am I forced to initialize and use one, or
can I/should manually instantiate the default implementations and use them
instead?

I`ll probably come up with more questions along the way, hope someone will
find the time to guide me on the right path.

Thanks,
Eduard

--
[1] https://github.com/apache/maven-indexer/
[2]
https://github.com/apache/maven-indexer/tree/master/indexer-examples/indexer-examples-basic

Re: Using the Maven Indexer

2014-11-24 Thread Hervé BOUTEMY

indexing is really meant to be done on local filesystem: the format is not 
really local repositories, but direct filesystem access to remote repository 
content

working through network cause performance issues, and you won't get some 
information like last modified time, or directory listing

notice the latest site is published at
http://maven.apache.org/maven-indexer-archives/maven-indexer-LATEST/

this can help you, since there was a lot of work on documentation

Regards,

Hervé

Le lundi 24 novembre 2014 13:44:18 Eduard Moraru a écrit :
 Hi,
 
 I have a new question: How can I index a remote repository? All the
 examples I have found and even the NexusIndexerCli seem to be focused about
 indexing *only* local repositories and then publishing this index for
 consumption.
 
 I do not want to do that. Is there any way I can pass an URL to an
 IndexPackingRequest instead of a (local) directory? Basically, my use case
 is:
 
 1. Take a maven URL
 2. If it has an index already created, use it through an
 IndexUpdatingRequest
 3. If not, create a local index (IndexPackingRequest?)
 4. In both cases, I then need to periodically update/synchronize my local
 index of the remote repository.
 
 Any help is deeply appreciated.
 
 Thanks,
 Eduard
 
 On Fri, Nov 21, 2014 at 7:07 PM, Eduard Moraru enygma2...@gmail.com wrote:
  Hi,
  
  I have recently started playing with the maven indexer [1], following the
  examples [2], and I have some questions (since AFAIS, documentation is
  practically unexistent on the matter):
  
  1) From what I can understand, you need an IndexingContext for each
  repository you plan to index. This makes you end up with n lucene indexes,
  one for each repository. Is there any way that I could have just 1 lucene
  index, with all my repositories indexed in the same place? If the main
  purpose is searchig, why scatter the indexed information across n indexes
  and make the whole process dificult? Maybe I`m missing something.
  
  2) On the same line as the first question, when it comes to searching, it
  seems that I can use a MergedIndexingContext to perform a search on
  multiple (all) indexed repositories (IndexingContexts). How does this
  merge
  the search results? I assume it takes each lucene index and queries it
  individually, but this probably means that the lucene scores of these
  merged results are completely messed up and ureliable, right?
  Any suggestions on how to properly perform search over multiple indexed
  repositories?
  
  3) About the Plexus Container: Am I forced to initialize and use one, or
  can I/should manually instantiate the default implementations and use them
  instead?
  
  I`ll probably come up with more questions along the way, hope someone will
  find the time to guide me on the right path.
  
  Thanks,
  Eduard
  
  --
  [1] https://github.com/apache/maven-indexer/
  [2]
  https://github.com/apache/maven-indexer/tree/master/indexer-examples/index
  er-examples-basic


-
To unsubscribe, e-mail: users-unsubscr...@maven.apache.org
For additional commands, e-mail: users-h...@maven.apache.org

Using the Maven Indexer

2014-11-21 Thread Eduard Moraru

Hi,

I have recently started playing with the maven indexer [1], following the
examples [2], and I have some questions (since AFAIS, documentation is
practically unexistent on the matter):

3) About the Plexus Container: Am I forced to initialize and use one, or
can I/should manually instantiate the default implementations and use them
instead?

I`ll probably come up with more questions along the way, hope someone will
find the time to guide me on the right path.

Thanks,
Eduard

--
[1] https://github.com/apache/maven-indexer/
[2]
https://github.com/apache/maven-indexer/tree/master/indexer-examples/indexer-examples-basic

Re: Using the Maven Indexer

Re: Using the Maven Indexer

Re: Using the Maven Indexer

Re: Using the Maven Indexer

Re: Using the Maven Indexer

Re: Using the Maven Indexer

Re: Using the Maven Indexer

Re: Using the Maven Indexer

Using the Maven Indexer

9 matches

Site Navigation

Mail list logo

Footer information