Displaying a particular field on result

2014-06-03 Thread Bayu Widyasanyata
Hi,

I'm sorry if this is a frequently asked question.

In default Solr's schema.xml file we define an author field like
following:
field name=author type=text_general stored=true indexed=true/

But this field seems not parsed (by nutch) and indexed (by Solr).
My query is always return null result for author field even some
documents (PDF) are have author contents.

How to display them?
What should I prepared during fetch  parsing which I missed out?
Any documents/links for this issue?

Thanks in advance.

-- 
wassalam,
[bayu]


Re: Displaying a particular field on result

2014-06-03 Thread Bayu Widyasanyata
Hi Ahmet,

I just refering to Solr's schema.xml which described this field definition.
In this case for example author field.
Then also refer to Solr query's result which I queried through Solr Admin
page that didn't response author field.
CMIIW.

Thanks.-


On Wed, Jun 4, 2014 at 5:19 AM, Ahmet Arslan iori...@yahoo.com.invalid
wrote:

 Hi Bayu,

 I think this is a nutch question, no?

 Ahmet



 On Wednesday, June 4, 2014 1:13 AM, Bayu Widyasanyata 
 bwidyasany...@gmail.com wrote:
 Hi,

 I'm sorry if this is a frequently asked question.

 In default Solr's schema.xml file we define an author field like
 following:
 field name=author type=text_general stored=true indexed=true/

 But this field seems not parsed (by nutch) and indexed (by Solr).
 My query is always return null result for author field even some
 documents (PDF) are have author contents.

 How to display them?
 What should I prepared during fetch  parsing which I missed out?
 Any documents/links for this issue?

 Thanks in advance.

 --
 wassalam,
 [bayu]




-- 
wassalam,
[bayu]


Re: Displaying a particular field on result

2014-06-03 Thread Bayu Widyasanyata
Hi Alexandre,

I've already play with fl parameter in Admin UI but the result is not I
expected.

From what I understand that Solr database structure is defined on Solr's
schema.xml.
On that file we defined in example author field to store author content
in Solr database.

Even I put author as fl paramater in Admin UI, the query will never
show the contents, even I have (PDF/doc) document having author content.

How to display that field?
Or take a previous step, how to ensure or check that field is already
stored on Solr?


On Wed, Jun 4, 2014 at 8:59 AM, Alexandre Rafalovitch arafa...@gmail.com
wrote:

 Are you looking for the 'fl' parameter by any chance:

 https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters#CommonQueryParameters-Thefl(FieldList)Parameter
 ?

 It's in the Admin UI as well.

 If not, then you really do need to rephrase your question. Maybe by
 giving a very specific example.

 Regards,
Alex.
 Personal website: http://www.outerthoughts.com/
 Current project: http://www.solr-start.com/ - Accelerating your Solr
 proficiency


 On Wed, Jun 4, 2014 at 8:51 AM, Bayu Widyasanyata
 bwidyasany...@gmail.com wrote:
  Hi Ahmet,
 
  I just refering to Solr's schema.xml which described this field
 definition.
  In this case for example author field.
  Then also refer to Solr query's result which I queried through Solr Admin
  page that didn't response author field.
  CMIIW.
 
  Thanks.-
 
 
  On Wed, Jun 4, 2014 at 5:19 AM, Ahmet Arslan iori...@yahoo.com.invalid
  wrote:
 
  Hi Bayu,
 
  I think this is a nutch question, no?
 
  Ahmet
 
 
 
  On Wednesday, June 4, 2014 1:13 AM, Bayu Widyasanyata 
  bwidyasany...@gmail.com wrote:
  Hi,
 
  I'm sorry if this is a frequently asked question.
 
  In default Solr's schema.xml file we define an author field like
  following:
  field name=author type=text_general stored=true
 indexed=true/
 
  But this field seems not parsed (by nutch) and indexed (by Solr).
  My query is always return null result for author field even some
  documents (PDF) are have author contents.
 
  How to display them?
  What should I prepared during fetch  parsing which I missed out?
  Any documents/links for this issue?
 
  Thanks in advance.
 
  --
  wassalam,
  [bayu]
 
 
 
 
  --
  wassalam,
  [bayu]




-- 
wassalam,
[bayu]


Re: Displaying a particular field on result

2014-06-03 Thread Bayu Widyasanyata
Thank you Alexandre!
I will check my configurations again.


On Wed, Jun 4, 2014 at 9:14 AM, Alexandre Rafalovitch arafa...@gmail.com
wrote:

 Ok, the question was if I understood it now:

 I am importing data from Nutch into Solr. One of the fields is
 author and I have defined it in Solr's schema.xml. Unfortunately, it
 is always empty when I check the records in the Solr's AdminUI. How
 can I confirm that the field was actually indexed into Solr?

 In which case, Bayu's answer should solve it. Nutch is either not
 extracting the field or trying to push it to Solr with a wrong name.
 Try switching the * field definition to catch it (stored=true).
 Alternatively, disable * definition all together and see what fields
 will fail (might be a lot of them though).

 Regards,
Alex.
 Personal website: http://www.outerthoughts.com/
 Current project: http://www.solr-start.com/ - Accelerating your Solr
 proficiency


 On Wed, Jun 4, 2014 at 9:08 AM, Bayu Widyasanyata
 bwidyasany...@gmail.com wrote:
  Hi Alexandre,
 
  I've already play with fl parameter in Admin UI but the result is not I
  expected.
 
  From what I understand that Solr database structure is defined on Solr's
  schema.xml.
  On that file we defined in example author field to store author content
  in Solr database.
 
  Even I put author as fl paramater in Admin UI, the query will never
  show the contents, even I have (PDF/doc) document having author content.
 
  How to display that field?
  Or take a previous step, how to ensure or check that field is already
  stored on Solr?
 
 
  On Wed, Jun 4, 2014 at 8:59 AM, Alexandre Rafalovitch 
 arafa...@gmail.com
  wrote:
 
  Are you looking for the 'fl' parameter by any chance:
 
 
 https://cwiki.apache.org/confluence/display/solr/Common+Query+Parameters#CommonQueryParameters-Thefl(FieldList)Parameter
  ?
 
  It's in the Admin UI as well.
 
  If not, then you really do need to rephrase your question. Maybe by
  giving a very specific example.
 
  Regards,
 Alex.
  Personal website: http://www.outerthoughts.com/
  Current project: http://www.solr-start.com/ - Accelerating your Solr
  proficiency
 
 
  On Wed, Jun 4, 2014 at 8:51 AM, Bayu Widyasanyata
  bwidyasany...@gmail.com wrote:
   Hi Ahmet,
  
   I just refering to Solr's schema.xml which described this field
  definition.
   In this case for example author field.
   Then also refer to Solr query's result which I queried through Solr
 Admin
   page that didn't response author field.
   CMIIW.
  
   Thanks.-
  
  
   On Wed, Jun 4, 2014 at 5:19 AM, Ahmet Arslan
 iori...@yahoo.com.invalid
   wrote:
  
   Hi Bayu,
  
   I think this is a nutch question, no?
  
   Ahmet
  
  
  
   On Wednesday, June 4, 2014 1:13 AM, Bayu Widyasanyata 
   bwidyasany...@gmail.com wrote:
   Hi,
  
   I'm sorry if this is a frequently asked question.
  
   In default Solr's schema.xml file we define an author field like
   following:
   field name=author type=text_general stored=true
  indexed=true/
  
   But this field seems not parsed (by nutch) and indexed (by Solr).
   My query is always return null result for author field even some
   documents (PDF) are have author contents.
  
   How to display them?
   What should I prepared during fetch  parsing which I missed out?
   Any documents/links for this issue?
  
   Thanks in advance.
  
   --
   wassalam,
   [bayu]
  
  
  
  
   --
   wassalam,
   [bayu]
 
 
 
 
  --
  wassalam,
  [bayu]




-- 
wassalam,
[bayu]


Re: Auto optimized of Solr indexing results

2013-12-02 Thread Bayu Widyasanyata
Thanks Erick for your advance and share.

Regards,


On Mon, Dec 2, 2013 at 11:06 PM, Erick Erickson erickerick...@gmail.comwrote:

 TieredMergePolicy is the default even though it's
 commented out in solrconfig, it's still being used.
 So there's nothing to do.

 Given the size of your index,  you can actually do
 whatever you please. Optimizing it will shrink its size,
 but frankly your index is so small I doubt you'll see any
 noticeable difference. They'll self-purge as you re-crawl
 eventually.

 In all, I think you can mostly ignore the issue.

 Best,
 Erick


 On Sun, Dec 1, 2013 at 8:00 PM, Bayu Widyasanyata
 bwidyasany...@gmail.comwrote:

  Hi Erick,
 
  After waiting for some days abt. a week (I did daily crawling 
 indexing),
  here are the docs summary:
 
  Num Docs:   9738
  Max Doc:   15311
  Deleted Docs: 5573
  Version: 781
  Segment Count: 5
 
  The percentage of deletedDocs of NumDocs is near 57%.
 
  In the other, the TieredMergePolicy in solrconfig.xml is still disabled.
 
  !--
  mergePolicy class=org.apache.lucene.index.TieredMergePolicy
int name=maxMergeAtOnce10/int
int name=segmentsPerTier10/int
  /mergePolicy
--
 
  Should we enable it and wait for the effect?
 
  Thanks!
 
 
 
  On Wed, Nov 20, 2013 at 9:55 PM, Bayu Widyasanyata
  bwidyasany...@gmail.comwrote:
 
   Thanks Erick.
   I will check that on next round.
  
   ---
   wassalam,
   [bayu]
  
   /sent from Android phone/
   On Nov 20, 2013 7:45 PM, Erick Erickson erickerick...@gmail.com
  wrote:
  
   You probably shouldn't optimize at all. The default TieredMergePolicy
   will eventually purge the deleted files' data, which is really what
   optimize
   does. So despite its name, most of the time it's not really worth the
   effort.
  
   Take a look at your Solr admin page, the overview link for a core.
   If the number of deleted docs is a significant percentage of your
   numDocs (I typically use 20% or so, but YMMV) then optimize
   might be worthwhile. Otherwise, it's a distraction unless and until
   you have some evidence that it actually makes a difference.
  
   Best,
   Erick
  
  
   On Wed, Nov 20, 2013 at 7:33 AM, Bayu Widyasanyata
   bwidyasany...@gmail.comwrote:
  
Hi,
   
After successfully configured re-crawling script, I sometimes
 checked
   and
found on Solr Admin that Optimized status of my collection is not
optimized (slash icon).
   
Hence I did optimized steps manually.
   
How to make my crawling optimized automatically?
   
Should we restart Solr (I use Tomcat) as shown on here [1]
   
[1] http://wiki.apache.org/nutch/Crawl
   
Thanks!
   
--
wassalam,
[bayu]
   
  
  
 
 
  --
  wassalam,
  [bayu]
 




-- 
wassalam,
[bayu]


Re: Auto optimized of Solr indexing results

2013-12-01 Thread Bayu Widyasanyata
Hi Erick,

After waiting for some days abt. a week (I did daily crawling  indexing),
here are the docs summary:

Num Docs:   9738
Max Doc:   15311
Deleted Docs: 5573
Version: 781
Segment Count: 5

The percentage of deletedDocs of NumDocs is near 57%.

In the other, the TieredMergePolicy in solrconfig.xml is still disabled.

!--
mergePolicy class=org.apache.lucene.index.TieredMergePolicy
  int name=maxMergeAtOnce10/int
  int name=segmentsPerTier10/int
/mergePolicy
  --

Should we enable it and wait for the effect?

Thanks!



On Wed, Nov 20, 2013 at 9:55 PM, Bayu Widyasanyata
bwidyasany...@gmail.comwrote:

 Thanks Erick.
 I will check that on next round.

 ---
 wassalam,
 [bayu]

 /sent from Android phone/
 On Nov 20, 2013 7:45 PM, Erick Erickson erickerick...@gmail.com wrote:

 You probably shouldn't optimize at all. The default TieredMergePolicy
 will eventually purge the deleted files' data, which is really what
 optimize
 does. So despite its name, most of the time it's not really worth the
 effort.

 Take a look at your Solr admin page, the overview link for a core.
 If the number of deleted docs is a significant percentage of your
 numDocs (I typically use 20% or so, but YMMV) then optimize
 might be worthwhile. Otherwise, it's a distraction unless and until
 you have some evidence that it actually makes a difference.

 Best,
 Erick


 On Wed, Nov 20, 2013 at 7:33 AM, Bayu Widyasanyata
 bwidyasany...@gmail.comwrote:

  Hi,
 
  After successfully configured re-crawling script, I sometimes checked
 and
  found on Solr Admin that Optimized status of my collection is not
  optimized (slash icon).
 
  Hence I did optimized steps manually.
 
  How to make my crawling optimized automatically?
 
  Should we restart Solr (I use Tomcat) as shown on here [1]
 
  [1] http://wiki.apache.org/nutch/Crawl
 
  Thanks!
 
  --
  wassalam,
  [bayu]
 




-- 
wassalam,
[bayu]


Auto optimized of Solr indexing results

2013-11-20 Thread Bayu Widyasanyata
Hi,

After successfully configured re-crawling script, I sometimes checked and
found on Solr Admin that Optimized status of my collection is not
optimized (slash icon).

Hence I did optimized steps manually.

How to make my crawling optimized automatically?

Should we restart Solr (I use Tomcat) as shown on here [1]

[1] http://wiki.apache.org/nutch/Crawl

Thanks!

-- 
wassalam,
[bayu]


Re: Auto optimized of Solr indexing results

2013-11-20 Thread Bayu Widyasanyata
Thanks Erick.
I will check that on next round.

---
wassalam,
[bayu]

/sent from Android phone/
On Nov 20, 2013 7:45 PM, Erick Erickson erickerick...@gmail.com wrote:

 You probably shouldn't optimize at all. The default TieredMergePolicy
 will eventually purge the deleted files' data, which is really what
 optimize
 does. So despite its name, most of the time it's not really worth the
 effort.

 Take a look at your Solr admin page, the overview link for a core.
 If the number of deleted docs is a significant percentage of your
 numDocs (I typically use 20% or so, but YMMV) then optimize
 might be worthwhile. Otherwise, it's a distraction unless and until
 you have some evidence that it actually makes a difference.

 Best,
 Erick


 On Wed, Nov 20, 2013 at 7:33 AM, Bayu Widyasanyata
 bwidyasany...@gmail.comwrote:

  Hi,
 
  After successfully configured re-crawling script, I sometimes checked and
  found on Solr Admin that Optimized status of my collection is not
  optimized (slash icon).
 
  Hence I did optimized steps manually.
 
  How to make my crawling optimized automatically?
 
  Should we restart Solr (I use Tomcat) as shown on here [1]
 
  [1] http://wiki.apache.org/nutch/Crawl
 
  Thanks!
 
  --
  wassalam,
  [bayu]
 



Re: Can not find solr core on admin page after setup

2013-10-30 Thread Bayu Widyasanyata
Hi Engy,

Have you copy solr's war (e.g. solr-4.5.1.war, for latest Solr
distribution) from Solr source distribution to Tomcat's webapps directory
(rename to solr.war on webapps dir.)?

After put that file and restarted the Tomcat, it will create 'solr' folder
under webapps.

Or, if you still found no Admin page, pls check Tomcat log (catalina.out).

Thanks.-


On Tue, Oct 29, 2013 at 8:54 PM, engy.morsy engy.mo...@bibalex.org wrote:

 Hi,

 I setup solr4.2 under apache tomcat on windows m/c. I created solr.xml
 under
 catalina/localhost that holds the solr/home path, I have only one core, so
 the solr.xml under the solr instance looks like:

 cores adminPath=/admin/cores defaultCoreName=core0 
 core name=core0 instanceDir=core0 /
 cores

 after starting the apache service, I did not find the core on the admin
 page. I checked the logs but no errors were found. I checked that the data
 folder was created successfully. I am not even able to access the core
 directly.Any idea !!

 Thanks
 Engy



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Can-not-find-solr-core-on-admin-page-after-setup-tp4098236.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
wassalam,
[bayu]


Re: Replace document title with filename if it's empty

2013-10-29 Thread Bayu Widyasanyata
Hi Erick,

Thanks for the info.

Regards,


On Wed, Oct 30, 2013 at 8:01 AM, Erick Erickson erickerick...@gmail.comwrote:

 You can write a custom bit of update code that lives on the Solr server
 that
 would essentially copy the filename field to title if title wasn't present.

 You could write a SolrJ program that does the Tika processing and add it
 before you sent the doc, see:
 http://searchhub.org/2012/02/14/indexing-with-solrj/

 Best,
 Erick


 On Mon, Oct 28, 2013 at 12:12 PM, Bayu Widyasanyata 
 bwidyasany...@gmail.com
  wrote:

  Hi,
 
  I just found that some of PDFs files crawled has no (empty) 'title'
  metadata.
  How to define or fetch the filename, and use it (filename) replacing
 empty
  'title' field?
 
  I didn't found filename field on schema.xml, and don't know how to make
  conditional for above conditions (if title is empty then ).
 
  Thanks in advance.
 
  --
  wassalam,
  [bayu]
 




-- 
wassalam,
[bayu]


Re: Solr Update URI is not found

2013-10-28 Thread Bayu Widyasanyata
On Mon, Oct 28, 2013 at 1:26 PM, Raymond Wiker rwi...@gmail.com wrote:

  request: http://localhost:8080/solr/update?wt=javabinversion=2

 I think this url is incorrect: there should be a core name between solr
 and update.


I changed th SolrURL on crawl script's option to:

./bin/crawl urls/seed.txt TestCrawl http://localhost:8080/solr/mycollection/2

And the result now is Bad Request.
I will look for another misconfiguration things...

=

org.apache.solr.common.SolrException: Bad Request

Bad Request

request: http://localhost:8080/solr/mycollection/update?wt=javabinversion=2
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430)
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
at
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
at
org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:155)
at
org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:118)
at
org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:44)
at
org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.close(ReduceTask.java:467)
at
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:535)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
2013-10-28 13:30:02,804 ERROR indexer.IndexingJob - Indexer:
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:123)
at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:185)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:195)



-- 
wassalam,
[bayu]


Re: Solr Update URI is not found

2013-10-28 Thread Bayu Widyasanyata
Hi Erick and All,

The problem is solved by copying schema-solr4.xml into my collection's Solr
conf (renamed to schema.xml).
I didn't use hadoop there, and apologize if it's better to post on this
Solr list since the problem appeared first on Solr Indexer step.

Regarding /2 option it's e-mail body evolution I thought :)
On my first posting, that was a crawl script syntax, as on my case:

# ./bin/crawl urls/seed.txt TestCrawl http://localhost:8080/solr/ 2

2 = the number of rounds.

See here:
http://wiki.apache.org/nutch/NutchTutorial#A3.3._Using_the_crawl_script

Again, thanks everyone!


On Mon, Oct 28, 2013 at 5:39 PM, Erick Erickson erickerick...@gmail.comwrote:

 This seems like a better question for the Nutch list. I see hadoop
 in there, so unless you've specifically configured solr to use
 the HDFS directory writer factory, this has to be coming from
 someplace else. And there are map/reduce tasks in here.

 BTW, it would be more helpful if you posted the URL that you
 successfully queried Solr with... What is the /2 on the end for?
 Do you use that when you query?

 Best,
 Erick


 On Mon, Oct 28, 2013 at 2:37 AM, Bayu Widyasanyata
 bwidyasany...@gmail.comwrote:

  On Mon, Oct 28, 2013 at 1:26 PM, Raymond Wiker rwi...@gmail.com wrote:
 
request: http://localhost:8080/solr/update?wt=javabinversion=2
  
   I think this url is incorrect: there should be a core name between
 solr
   and update.
  
 
  I changed th SolrURL on crawl script's option to:
 
  ./bin/crawl urls/seed.txt TestCrawl
  http://localhost:8080/solr/mycollection/2
 
  And the result now is Bad Request.
  I will look for another misconfiguration things...
 
  =
 
  org.apache.solr.common.SolrException: Bad Request
 
  Bad Request
 
  request:
  http://localhost:8080/solr/mycollection/update?wt=javabinversion=2
  at
 
 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430)
  at
 
 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
  at
 
 
 org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
  at
 
 
 org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:155)
  at
  org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:118)
  at
 
 
 org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:44)
  at
 
 
 org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.close(ReduceTask.java:467)
  at
  org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:535)
  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421)
  at
  org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
  2013-10-28 13:30:02,804 ERROR indexer.IndexingJob - Indexer:
  java.io.IOException: Job failed!
  at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
  at
 org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:123)
  at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:185)
  at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
  at
 org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:195)
 
 
 
  --
  wassalam,
  [bayu]
 




-- 
wassalam,
[bayu]


Replace document title with filename if it's empty

2013-10-28 Thread Bayu Widyasanyata
Hi,

I just found that some of PDFs files crawled has no (empty) 'title'
metadata.
How to define or fetch the filename, and use it (filename) replacing empty
'title' field?

I didn't found filename field on schema.xml, and don't know how to make
conditional for above conditions (if title is empty then ).

Thanks in advance.

-- 
wassalam,
[bayu]


Solr Update URI is not found

2013-10-27 Thread Bayu Widyasanyata
Hi,

I just installed Nutch 1.7 and latest Solr 4.5.1 successfully.
But I got the error when execute the crawl script
(./bin/crawl urls/seed.txt TestCrawl http://localhost:8080/solr/ 2)

The error is occured on Solr Indexer step.
Following the error on hadoop.log:

2013-10-28 06:16:59,815 WARN  mapred.LocalJobRunner -
job_local1930559258_0001
org.apache.solr.common.SolrException: Not Found

Not Found

request: http://localhost:8080/solr/update?wt=javabinversion=2
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430)
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
at
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
at
org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:155)
at
org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:118)
at
org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:44)
at
org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.close(ReduceTask.java:467)
at
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:535)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
2013-10-28 06:17:00,243 ERROR indexer.IndexingJob - Indexer:
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:123)
at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:185)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:195)

I suspect the problem is broken URI (Not Found message) of
http://localhost:8080/solr/update?wt=javabinversion=2

That URI was also report not found when I accessed from browser directly.

Is there any configuration that I missed?

Thanks.-

-- 
wassalam,
[bayu]


Re: Solr Update URI is not found

2013-10-27 Thread Bayu Widyasanyata
Additional info:

- I use Tomcat 7.0.42
- Following are Tomcat/catalina's log when nutch failed on Solr index
process. It replies 404 error:

10.1.160.40 - - [28/Oct/2013:08:50:02 +0700] POST
/solr/update?wt=javabinversion=2 HTTP/1.1 404 973
10.1.160.40 - - [28/Oct/2013:08:50:02 +0700] POST
/solr/update?wt=javabinversion=2 HTTP/1.1 404 973

Thanks.-



On Mon, Oct 28, 2013 at 7:19 AM, Bayu Widyasanyata
bwidyasany...@gmail.comwrote:

 Hi,

 I just installed Nutch 1.7 and latest Solr 4.5.1 successfully.
 But I got the error when execute the crawl script
 (./bin/crawl urls/seed.txt TestCrawl http://localhost:8080/solr/ 2)

 The error is occured on Solr Indexer step.
 Following the error on hadoop.log:

 2013-10-28 06:16:59,815 WARN  mapred.LocalJobRunner -
 job_local1930559258_0001
 org.apache.solr.common.SolrException: Not Found

 Not Found

 request: http://localhost:8080/solr/update?wt=javabinversion=2
 at
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430)
 at
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
 at
 org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
 at
 org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:155)
 at
 org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:118)
 at
 org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:44)
 at
 org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.close(ReduceTask.java:467)
 at
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:535)
 at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421)
 at
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
 2013-10-28 06:17:00,243 ERROR indexer.IndexingJob - Indexer:
 java.io.IOException: Job failed!
 at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
 at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:123)
 at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:185)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
 at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:195)

 I suspect the problem is broken URI (Not Found message) of
 http://localhost:8080/solr/update?wt=javabinversion=2

 That URI was also report not found when I accessed from browser directly.

 Is there any configuration that I missed?

 Thanks.-

 --
 wassalam,
 [bayu]




-- 
wassalam,
[bayu]


Re: Solr Update URI is not found

2013-10-27 Thread Bayu Widyasanyata
Hi Alex,

I can do a common queries.
Below are the json result for *:* query:

{
  responseHeader: {
status: 0,
QTime: 0,
params: {
  indent: true,
  q: *:*,
  _: 1382938341864,
  wt: json
}
  },
  response: {
numFound: 0,
start: 0,
docs: []
  }
}




On Mon, Oct 28, 2013 at 9:11 AM, Alexandre Rafalovitch
arafa...@gmail.comwrote:

 Can you do queries? Maybe the default collection was somehow not setup and
 you need to provide collection name explicitly. What endpoints does admin
 interface use when you do a query?

 Regards,
Alex.

 Personal website: http://www.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all at
 once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


 On Mon, Oct 28, 2013 at 8:54 AM, Bayu Widyasanyata
 bwidyasany...@gmail.comwrote:

  Additional info:
 
  - I use Tomcat 7.0.42
  - Following are Tomcat/catalina's log when nutch failed on Solr index
  process. It replies 404 error:
 
  10.1.160.40 - - [28/Oct/2013:08:50:02 +0700] POST
  /solr/update?wt=javabinversion=2 HTTP/1.1 404 973
  10.1.160.40 - - [28/Oct/2013:08:50:02 +0700] POST
  /solr/update?wt=javabinversion=2 HTTP/1.1 404 973
 
  Thanks.-
 
 
 
  On Mon, Oct 28, 2013 at 7:19 AM, Bayu Widyasanyata
  bwidyasany...@gmail.comwrote:
 
   Hi,
  
   I just installed Nutch 1.7 and latest Solr 4.5.1 successfully.
   But I got the error when execute the crawl script
   (./bin/crawl urls/seed.txt TestCrawl http://localhost:8080/solr/ 2)
  
   The error is occured on Solr Indexer step.
   Following the error on hadoop.log:
  
   2013-10-28 06:16:59,815 WARN  mapred.LocalJobRunner -
   job_local1930559258_0001
   org.apache.solr.common.SolrException: Not Found
  
   Not Found
  
   request: http://localhost:8080/solr/update?wt=javabinversion=2
   at
  
 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:430)
   at
  
 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
   at
  
 
 org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
   at
  
 
 org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:155)
   at
   org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:118)
   at
  
 
 org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:44)
   at
  
 
 org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.close(ReduceTask.java:467)
   at
   org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:535)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421)
   at
  
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
   2013-10-28 06:17:00,243 ERROR indexer.IndexingJob - Indexer:
   java.io.IOException: Job failed!
   at
 org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
   at
  org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:123)
   at
 org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:185)
   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
   at
  org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:195)
  
   I suspect the problem is broken URI (Not Found message) of
   http://localhost:8080/solr/update?wt=javabinversion=2
  
   That URI was also report not found when I accessed from browser
 directly.
  
   Is there any configuration that I missed?
  
   Thanks.-
  
   --
   wassalam,
   [bayu]
  
 
 
 
  --
  wassalam,
  [bayu]
 




-- 
wassalam,
[bayu]


How to protect Solr 4.1 Admin page?

2013-02-14 Thread Bayu Widyasanyata
Hi,

I'm sure it's an old question..
I just want protecting Admin page (/solr) with Basic Authentication.
But I can't found fine answer yet out there.

I use Solr 4.1 with Apache Tomcat/7.0.35.

Could anyone give me a quick hints or links?

Thanks in advance!

-- 
wassalam,
[bayu]


Re: How to protect Solr 4.1 Admin page?

2013-02-14 Thread Bayu Widyasanyata
On Thu, Feb 14, 2013 at 3:53 PM, Gora Mohanty g...@mimirtech.com wrote:

 3. Depending on how you installed Solr, there should be a folder
 like webapps/solr/WEB-INF/ . In that folder, edit web.xml, and
 add security-constraint and security-role tags. The entries
 for the latter should match the entries in step 1.


One thing that I'm not found is folder webapps/solr/WEB-INF/.
I install binary Solr distribution.
It might be not created when deployed or first accessed..??
I'm not sure... :( since I also new on Tomcat deployment.

Thanks,

-- 
wassalam,
[bayu]