[Nutch Wiki] Trivial Update of "MontyLaru" by MontyLaru

2013-03-28 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "MontyLaru" page has been changed by MontyLaru:
http://wiki.apache.org/nutch/MontyLaru

New page:
My name is Monty Larue. I life in Lower Gledfield (Great Britain).<>
<>
<>
Here is my blog; [[http://partypokerbonuscode.forumotion.com/|simply click the 
following article]]


[Nutch Wiki] Trivial Update of "EstelaKin" by EstelaKin

2013-03-28 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "EstelaKin" page has been changed by EstelaKin:
http://wiki.apache.org/nutch/EstelaKin

New page:
Not much to say about me at all.<>
Great to be a member of apache.org.<>
I just hope I am useful in some way here.<>
<>
Also visit my web site [[http://detroitwallsparty.com/homepage|nose surgery new 
york city]]


[Nutch Wiki] Trivial Update of "JandmixSa" by JandmixSa

2013-03-28 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "JandmixSa" page has been changed by JandmixSa:
http://wiki.apache.org/nutch/JandmixSa

New page:
Not much to say about me really.<>
Great to be a member of apache.org.<>
I just wish I'm useful at all<>
<>
my web page :: 
[[http://www.thearticlebeach.com/index.php?q=Real+Time+File+Sync&page=search&search.x=1&search.y=1|real
 time file sync]]


[jira] [Commented] (NUTCH-1547) BasicIndexingFilter - Problem to index full title

2013-03-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13617048#comment-13617048
 ] 

Hudson commented on NUTCH-1547:
---

Integrated in Nutch-nutchgora #548 (See 
[https://builds.apache.org/job/Nutch-nutchgora/548/])
NUTCH-1547 BasicIndexingFilter - Problem to index full title (Revision 
1462079)

 Result = SUCCESS
fenglu : http://svn.apache.org/viewvc/nutch/branches/2.x/?view=rev&rev=1462079
Files : 
* /nutch/branches/2.x/CHANGES.txt
* /nutch/branches/2.x/conf/nutch-default.xml
* 
/nutch/branches/2.x/src/plugin/index-basic/src/java/org/apache/nutch/indexer/basic/BasicIndexingFilter.java


> BasicIndexingFilter - Problem to index full title
> -
>
> Key: NUTCH-1547
> URL: https://issues.apache.org/jira/browse/NUTCH-1547
> Project: Nutch
>  Issue Type: Bug
>  Components: indexer
>Affects Versions: 1.6, 2.1
>Reporter: Gustavo Rauber
>Assignee: lufeng
>Priority: Minor
> Fix For: 1.7, 2.2
>
> Attachments: NUTCH-1547-2x.patch, NUTCH-1547.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> I have faced this issue when trying to index the entire title, just like the 
> content, configuring its value on nutch-default.xml to -1 
> (indexer.max.title.length). I think the behavior should be the same as the 
> content.
> If you would like to fix it, just replace the line number 90:
> if (title.length() > MAX_TITLE_LENGTH) {  // truncate title if needed
> by this one:
> if (MAX_TITLE_LENGTH > -1 && title.length() > MAX_TITLE_LENGTH) {  // 
> truncate title if needed
> Stack Trace:
> java.lang.StringIndexOutOfBoundsException: String index out of range: -1
>   at java.lang.String.substring(String.java:1937)
>   at 
> org.apache.nutch.indexer.basic.BasicIndexingFilter.filter(BasicIndexingFilter.java:91)
>   at 
> org.apache.nutch.indexer.IndexingFilters.filter(IndexingFilters.java:109)
>   at 
> org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:272)
>   at 
> org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:53)
>   at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:519)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
> Cheers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1547) BasicIndexingFilter - Problem to index full title

2013-03-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13617043#comment-13617043
 ] 

Hudson commented on NUTCH-1547:
---

Integrated in Nutch-trunk #2148 (See 
[https://builds.apache.org/job/Nutch-trunk/2148/])
NUTCH-1547 BasicIndexingFilter - Problem to index full title (Revision 
1462078)

 Result = SUCCESS
fenglu : http://svn.apache.org/viewvc/nutch/trunk/?view=rev&rev=1462078
Files : 
* /nutch/trunk/CHANGES.txt
* /nutch/trunk/conf/nutch-default.xml
* 
/nutch/trunk/src/plugin/index-basic/src/java/org/apache/nutch/indexer/basic/BasicIndexingFilter.java


> BasicIndexingFilter - Problem to index full title
> -
>
> Key: NUTCH-1547
> URL: https://issues.apache.org/jira/browse/NUTCH-1547
> Project: Nutch
>  Issue Type: Bug
>  Components: indexer
>Affects Versions: 1.6, 2.1
>Reporter: Gustavo Rauber
>Assignee: lufeng
>Priority: Minor
> Fix For: 1.7, 2.2
>
> Attachments: NUTCH-1547-2x.patch, NUTCH-1547.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> I have faced this issue when trying to index the entire title, just like the 
> content, configuring its value on nutch-default.xml to -1 
> (indexer.max.title.length). I think the behavior should be the same as the 
> content.
> If you would like to fix it, just replace the line number 90:
> if (title.length() > MAX_TITLE_LENGTH) {  // truncate title if needed
> by this one:
> if (MAX_TITLE_LENGTH > -1 && title.length() > MAX_TITLE_LENGTH) {  // 
> truncate title if needed
> Stack Trace:
> java.lang.StringIndexOutOfBoundsException: String index out of range: -1
>   at java.lang.String.substring(String.java:1937)
>   at 
> org.apache.nutch.indexer.basic.BasicIndexingFilter.filter(BasicIndexingFilter.java:91)
>   at 
> org.apache.nutch.indexer.IndexingFilters.filter(IndexingFilters.java:109)
>   at 
> org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:272)
>   at 
> org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:53)
>   at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:519)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
> Cheers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[Nutch Wiki] Trivial Update of "BonnieBoo" by BonnieBoo

2013-03-28 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "BonnieBoo" page has been changed by BonnieBoo:
http://wiki.apache.org/nutch/BonnieBoo

New page:
Jackie Stewart is the name his parents gave him although it is not his birth 
name.<>
He is a production and distribution officer. Oklahoma is the place he loves 
most. As a man what he really likes is flower arranging but he is struggling to 
find time for it. He's been working on his website for some time now. Check it 
out here: http://complaintwire.org/complaint/M2NjVfzlxQU/reputation-management


[jira] [Updated] (NUTCH-1273) Fix [deprecation] javac warnings

2013-03-28 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney updated NUTCH-1273:


Attachment: NUTCH-deprecate.patch

patch for 2.x HEAD

> Fix [deprecation] javac warnings
> 
>
> Key: NUTCH-1273
> URL: https://issues.apache.org/jira/browse/NUTCH-1273
> Project: Nutch
>  Issue Type: Sub-task
>  Components: build
>Affects Versions: nutchgora, 1.5
>Reporter: Lewis John McGibbney
>Priority: Minor
> Fix For: 1.7
>
> Attachments: NUTCH-1273-nutchgora.patch, NUTCH-1273-trunk.patch, 
> NUTCH-1273-v2-trunk.patch, NUTCH-deprecate.patch
>
>
> As part of this task, these warnings should be resolved, however this 
> particular strand of warnings can either be resolved by adding
> {code}
> @SuppressWarnings("deprecation")
> {code}
> or by actually upgrading our class usage to rely upon non-deprecated classes. 
> Which option is more appropriate for the project?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (NUTCH-1550) xercesImpl and xmlParserAPIs (org.apache.xml) packages and classes only used in three Nutch classes

2013-03-28 Thread Lewis John McGibbney (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved NUTCH-1550.
-

Resolution: Invalid

This is utter garbage. Xerces uses many org.w3c imports which we pull from 
these dependencies. I'm closing as garbage.

> xercesImpl and xmlParserAPIs (org.apache.xml) packages and classes only used 
> in three Nutch classes
> ---
>
> Key: NUTCH-1550
> URL: https://issues.apache.org/jira/browse/NUTCH-1550
> Project: Nutch
>  Issue Type: Improvement
>  Components: build, parser
>Affects Versions: 1.6, 2.1
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Minor
> Fix For: 1.7, 2.2
>
>
> DOMSerializerImpl from xerces is deprecated in our current artifact. It is 
> replaced by the (still ancient but slightly newer 
> org.apache.xml.serializer.dom3.LSSerializerImpl in [0]). 
> Upon closer inspection it seems that find . | xargs grep "org.apache.xml" * 
> only pulled up DOMBuilder, XMLChatacterRecognizer and DOMContentUtilsTest as 
> the places where such classes are used.
> I am confused as to why they are included as primary dependencies within 
> Nutch. Either these XML specific dependencies should be restricted 
> dependencies to parse-html or else they should be removed and replaced by the 
> new artifact [0].  
> [0] http://search.maven.org/#artifactdetails|xalan|serializer|2.7.1|jar

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[Nutch Wiki] Trivial Update of "JeanneSmy" by JeanneSmy

2013-03-28 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "JeanneSmy" page has been changed by JeanneSmy:
http://wiki.apache.org/nutch/JeanneSmy

New page:
With every of these website you want to obtain accounts that may also be 
creating or you could have content that is really liked and similar by large 
degrees of followers.<>
It may well be relatively very simple on some sites, like Pinterest, Twitter, 
and Instagram for each and almost every piece of info shows how often times it 
has lately shared, even if you are not a relation or subscriber among the 
original poster.<>
You can let your christians to buy welcome this change products by showing the 
snapshots secured at the several states of it has manufacturing procedures for 
it will make buyers more satisfied and consequently help them - trust you. If 
you are starting a business favor hair design, home decoration, landscaping, 
muscles repairing, decoration or related things right after that Instagram is 
suitable for you.<>
<>
The biggest advantage social advertisers can play from a successful business 
cooperation is attracting workers not directly related to your company to 
activate with followers involving their social graph whilst relates to item or 
service or service one is promoting.<>
In the instance that social media advertising and advertising is done well, the 
community most importantly will champion marketing and services, thus driving 
new agency leads for the main reason partners while essentially act as the 
entire orchestrator.<>
The entire "big three" in social interaction services continue to end up 
Facebook, Twitter and as well as LinkedIn. However, in the previous year, 
relative beginners Pinterest and Instagram are quickly simply being almost as 
popular. There's also Google+, but not one of the major research suppliers have 
yet being privy to all of the statistics related to this site. Yet another site 
of note is the earlier MySpace, which has been purchased that undergoing a 
extensive renovation to help it viable once again.<>
If you have so many photos happy for sharing don't post them almost always. 
Remember Instagram is a global community and is truly busy all time. So 
regarding get exposed to global audience anywhere from other parts related with 
world share every single and every photos with step 2 hours of time 
interval.<>
Commonly do not post pictures at their night time. When you deliver photos at 
basic intervals you will come to bear in mind your followers apply. That is 
without a doubt you will acknowledge at what time most of your favorite 
followers are via the web.<>
Once you find that you can share your video / photo on that confident timing to 
obtain maximum likes, comment forms and followers. This is the actual easiest 
way to get [[http://www.buyigservices.com|buyigservices]].<>
The very newsfeed includes subject material from all your new linked social 
media (currently supporting Facebook Pinterest, Twitter while even Instagram) 
and the best part is usually you can oftentimes let it prove organized 
automatically or perhaps even do it manually.<>
<>
Since Klout rates people based on often the impact on your trusty sphere of 
influence, it's interesting which will analyze the energy some people maintain 
on not only the online world, but on the world in conventional. According so 
that it will Klout, Lady Coo has a rate of 90 as well as , claims that rachel 
is influential about Music and Fashion, but more importantly, that she impacts 
roughly five billion dollars people. Sufficient reason for the most twitter 
followers on Twitter, your company might expect you see, the score to be 
higher, but Klout doesn't just look at numbers when determining the score, it 
looks at the quality related with content.<>
<>
<>
Consider popular Hashtags Hashtags in a aficionado shell is the notion that let 
the subscribers of Instagram that can group a number of photos together. One 
sure way of making certain that the photographs receive cash publicity is 
through the process of tagging them.<>
For example, if you possess a sun picture, you will be able to tag it using a 
popular #sunset content label so that it will be available with other photos 
that have the identical tag (twitter buyers?). The theory is that people might 
search in points that interest all of #cat #icecream #olympics.<>
<>
http://chatto.robertobifulco.it/node/24592<>
http://courses.aaph.org/node/1033<>
http://biomed.kiev.ua/nonlinearLab/node/4873<>
<>
my web blog; [[http://www.neworleansbacteriafree.com/node/96771|can you buy 
instagram followers]]


[jira] [Created] (NUTCH-1550) xercesImpl and xmlParserAPIs (org.apache.xml) packages and classes only used in three Nutch classes

2013-03-28 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-1550:
---

 Summary: xercesImpl and xmlParserAPIs (org.apache.xml) packages 
and classes only used in three Nutch classes
 Key: NUTCH-1550
 URL: https://issues.apache.org/jira/browse/NUTCH-1550
 Project: Nutch
  Issue Type: Improvement
  Components: build, parser
Affects Versions: 2.1, 1.6
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
Priority: Minor
 Fix For: 1.7, 2.2


DOMSerializerImpl from xerces is deprecated in our current artifact. It is 
replaced by the (still ancient but slightly newer 
org.apache.xml.serializer.dom3.LSSerializerImpl in [0]). 
Upon closer inspection it seems that find . | xargs grep "org.apache.xml" * 
only pulled up DOMBuilder, XMLChatacterRecognizer and DOMContentUtilsTest as 
the places where such classes are used.
I am confused as to why they are included as primary dependencies within Nutch. 
Either these XML specific dependencies should be restricted dependencies to 
parse-html or else they should be removed and replaced by the new artifact [0]. 
 
[0] http://search.maven.org/#artifactdetails|xalan|serializer|2.7.1|jar

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[Nutch Wiki] Trivial Update of "RochelleS" by RochelleS

2013-03-28 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "RochelleS" page has been changed by RochelleS:
http://wiki.apache.org/nutch/RochelleS

New page:
Got nothing to write about myself really.<>
Enjoying to be a member of apache.org.<>
I really wish Im useful in one way here.<>
<>
Here is my web site :: 
[[http://mp3sdown.com/music/download/1/the-script.html|Suggested Looking at]]


[Nutch Wiki] Trivial Update of "JerriHarr" by JerriHarr

2013-03-28 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "JerriHarr" page has been changed by JerriHarr:
http://wiki.apache.org/nutch/JerriHarr

New page:
Hi, I am Jake Hy. What I really enjoy carrying out is to climb and I'll is 
starting something else along with keep in mind this. I work as a good solid 
computer operator though I plan on changing it. Ct is the place I love a large 
percentage of but I may have to move in a year or two. Check out my net site 
here: 
http://www.beamload.info/story.php?title=pangrango-2-hotel-indonesia-mega-travel<>
<>
my web site ... 
[[http://gof2wiki.pf-control.de/mediawiki-1.18.1/index.php?title=Benutzer:JennyT49|Batam
 Indonesia Tourist Attractions]]


[Nutch Wiki] Trivial Update of "JulietSae" by JulietSae

2013-03-28 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "JulietSae" page has been changed by JulietSae:
http://wiki.apache.org/nutch/JulietSae

New page:
There is nothing to write about me at all.<>
Yes! Im a part of apache.org.<>
I just hope I am useful at all<>
<>
Here is my blog post ... 
[[http://6monthscarinsurance.co.uk/one-months-car-insurance.html|one month of 
car insurance]]


[Nutch Wiki] Trivial Update of "MelbaHilt" by MelbaHilt

2013-03-28 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "MelbaHilt" page has been changed by MelbaHilt:
http://wiki.apache.org/nutch/MelbaHilt

New page:
Nothing to say about myself I think.<>
Enjoying to be a part of apache.org.<>
I just hope Im useful in some way here.<>
<>
Feel free to surf to my site: [[https://Servulo.us/|additional reading]]


[Nutch Wiki] Trivial Update of "WERMark" by WERMark

2013-03-28 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "WERMark" page has been changed by WERMark:
http://wiki.apache.org/nutch/WERMark

New page:
I am 19 years old and my name is Mark Mcqueen.<>
<>
I life in Sells (United States).<>
<>
Here is my weblog ... 
[[http://wenzlitschke.de/index.php?title=Benutzer:LizetteCo|link]]


[Nutch Wiki] Trivial Update of "SangA45" by SangA45

2013-03-28 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "SangA45" page has been changed by SangA45:
http://wiki.apache.org/nutch/SangA45

New page:
Nothing to tell about myself at all.<>
Finally a member of apache.org.<>
I really hope I'm useful in one way here.<>
<>
Here is my web page 
[[http://danielelawson15.soulcast.com/1/Get-Freebies-and-Rewards-by-way-of-Free-Microsoft-Points-|free
 microsoft points]]


[jira] [Commented] (NUTCH-1538) tuning of loaded fields during fetcherJob start-up

2013-03-28 Thread Roland von Herget (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13616621#comment-13616621
 ] 

Roland von Herget commented on NUTCH-1538:
--

Hi Lufeng,

I'm not sure if I understood your point correctly, but if you mean that some 
3rd party plugin may use these fields:

1) In a normal workflow it would be like this:
- fetcher startup
- fetcher gets content via http and stores it to DB
- fetcher shutdown
- parser startup
- parser loads content from DB, parses, store parsed data in DB
- parser shutdown

2) In this discussed workflow (original code):
- fetcher startup
- fetcher loads content from DB
- fetcher gets _new_ content via http (overwriting loaded content from DB)
- fetcher runs parser and stores all in DB
- fetcher shutdown

With my patch, we just touch workflow 2) - skipping step 2 "loading content 
from db".
Every field we load in 2)/step 2 should be overwritten by step 3, if not 
workflow 1) can't work.

I know, this is not backed by a complete knowledge of the code, but from a 
logic point of view it makes sense to me ;)
Just my 2 cents.

> tuning of loaded fields during fetcherJob start-up
> --
>
> Key: NUTCH-1538
> URL: https://issues.apache.org/jira/browse/NUTCH-1538
> Project: Nutch
>  Issue Type: Improvement
>  Components: fetcher
>Affects Versions: 2.1
> Environment: nutch 2.1 / cassandra 1.2.1 / gora-cassandra 0.2 / 
> gora-core 0.2.1 
> running fetch with parse=true
>Reporter: Roland von Herget
> Attachments: NUTCH-1538-FetcherJob-v1.patch
>
>
> Main problem is, nutch is loading nearly every row & column from DB during 
> startup of a fetcherJob when fetcher.parse=true.
> A parserJob needs e.g. the CONTENT field from db, to parse.
> The fetcherJob adds all fields of the parserJob to it's needed fields, if 
> running with fetcher.parse=true. [FetcherJob.getFields()]
> If the nutch configuration saves all fetched data to DB 
> (fetcher.store.content=true) you'll end up loading GBs of unused content 
> during fetcherJob start-up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[Nutch Wiki] Trivial Update of "PollyBosw" by PollyBosw

2013-03-28 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "PollyBosw" page has been changed by PollyBosw:
http://wiki.apache.org/nutch/PollyBosw

New page:
My Dad called me: Polly Boswell<>
Age today: 18<>
Country: Australia<>
Home town: Tailem Bend <>
ZIP: 5259<>
Address: 53 Hereford Avenue<>
<>
Here is my web blog :: 
[[http://www.southdownsdiscovery.com/forum//index.php//index.php//index.php//index.php/index.php?page=User&userID=1399|mouse
 click the next page]]


[jira] [Commented] (NUTCH-1549) Fix deprecated use of Tika MimeType API in o.a.n.util.MimeUtil

2013-03-28 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13616600#comment-13616600
 ] 

Lewis John McGibbney commented on NUTCH-1549:
-

I opened this as another issue and linked to others as this issue may end up 
upgrading more within MimeUtil (and further afield) than I expect.

> Fix deprecated use of Tika MimeType API in o.a.n.util.MimeUtil 
> ---
>
> Key: NUTCH-1549
> URL: https://issues.apache.org/jira/browse/NUTCH-1549
> Project: Nutch
>  Issue Type: Bug
>  Components: build
>Affects Versions: 1.6, 2.1
>Reporter: Lewis John McGibbney
> Fix For: 1.7, 2.2
>
>
> There are still problems with this issue (which actually builds on from the 
> work undertaken by [~markus17] on NUTCH-1230). I meant to mention and address 
> them ages ago and they recently resurfaced Whilst tackling NUTCH-1273. The 
> following code is deprecated 
> {code}
> 170   // If no mime-type header, or cannot find a corresponding registered
> 171   // mime-type, then guess a mime-type from the url pattern
> 172   type = this.mimeTypes.getMimeType(url) != null ? this.mimeTypes
> 173   .getMimeType(url) : type;
> 174   }
> 175   
> {code}
> Thanks to Nick Burch over on Tika, I attempted to upgrade it to the following
> {code}
> String mt = getMimeType(url);
> type = mt != null ? mt : type;
> {code} 
> Which will of course not compile as the javac rightly flags incompatible 
> types as the error.
> This is present in both trunk and 2.x and we should address it once and for 
> all.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (NUTCH-1549) Fix deprecated use of Tika MimeType API in o.a.n.util.MimeUtil

2013-03-28 Thread Lewis John McGibbney (JIRA)
Lewis John McGibbney created NUTCH-1549:
---

 Summary: Fix deprecated use of Tika MimeType API in 
o.a.n.util.MimeUtil 
 Key: NUTCH-1549
 URL: https://issues.apache.org/jira/browse/NUTCH-1549
 Project: Nutch
  Issue Type: Bug
  Components: build
Affects Versions: 2.1, 1.6
Reporter: Lewis John McGibbney
 Fix For: 1.7, 2.2


There are still problems with this issue (which actually builds on from the 
work undertaken by [~markus17] on NUTCH-1230). I meant to mention and address 
them ages ago and they recently resurfaced Whilst tackling NUTCH-1273. The 
following code is deprecated 
{code}
170 // If no mime-type header, or cannot find a corresponding registered
171 // mime-type, then guess a mime-type from the url pattern
172 type = this.mimeTypes.getMimeType(url) != null ? this.mimeTypes
173 .getMimeType(url) : type;
174 }
175 
{code}

Thanks to Nick Burch over on Tika, I attempted to upgrade it to the following

{code}
String mt = getMimeType(url);
type = mt != null ? mt : type;
{code} 

Which will of course not compile as the javac rightly flags incompatible types 
as the error.

This is present in both trunk and 2.x and we should address it once and for all.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Important : Bunch of Spam Created under Nutch Wiki!!

2013-03-28 Thread kiran chitturi
Thank you Chris. I have posted the message in freenode and filed a JIRA
https://issues.apache.org/jira/browse/INFRA-6081


On Thu, Mar 28, 2013 at 3:18 PM, Mattmann, Chris A (388J) <
chris.a.mattm...@jpl.nasa.gov> wrote:

> Hi Kiran,
>
> Yes, my recommendation:
>
> 1. Jump into #asfinfra on freeonode, find Joe, or Gavin or Daniel,
> ask for help. If you don't have IRC, email infrastruct...@apache.org
> and/or file a https://issues.apache.org/jira/browse/INFRA ticket
>
> 2. Request that they enable ASAP ContributorsGroup only acls
>
> I know that many Apache wikis (MoinMon) are being attackedŠ
>
> Cheers,
> Chris
>
>
> ++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattm...@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++
>
>
>
>
> -Original Message-
> From: kiran chitturi 
> Reply-To: "dev@nutch.apache.org" 
> Date: Thursday, March 28, 2013 12:15 PM
> To: "dev@nutch.apache.org" 
> Subject: Fwd: Important : Bunch of Spam Created under Nutch Wiki!!
>
> >Thanks to Ken (check message below) for reporting our insecure wiki. I
> >have checked it and anyone can create an fake account and edit any of our
> >wiki pages or create new ones.
> >
> >
> >When I first registered to the wiki, all the pages are immutable and
> >Lewis had to add me to Contributors group to make changes to the wiki.
> >
> >
> >Probably, the setting was hacked for now and that is the reason we are
> >facing lot of spam.
> >
> >
> >Can we contact the infra@apache and request them to lock down the wiki as
> >the other groups did ?
> >
> >
> >
> >
> >-- Forwarded message --
> >From: Ken Krugler 
> >Date: Thu, Mar 28, 2013 at 1:35 PM
> >Subject: Re: Important : Bunch of Spam Created under Nutch Wiki!!
> >To: dev@nutch.apache.org
> >
> >
> >Hi Kiran,
> >
> >On Mar 28, 2013, at 2:03am, kiran chitturi wrote:
> >
> >
> >Thank you Ken for the information. I think the access is already
> >restricted to Contributors Only. Someone can please confirm, if it is
> >not.
> >
> >
> >
> >
> >
> >It's not, as far as I know. I just created a fake account, logged in with
> >it, and edited the front page.
> >
> >
> >If anyone needs to edit wiki, they would need to ask someone to get
> >access to wiki pages.
> >
> >
> >Do you know if Solr still got hit by spam after locking down the wiki ?
> >
> >
> >
> >
> >
> >
> >I think that change helped cut down most of the spam, but I don't monitor
> >the Solr list that closely, sorry.
> >
> >
> >-- Ken
> >
> >
> >
> >
> >
> >
> >On Thu, Mar 28, 2013 at 1:40 AM, Ken Krugler
> > wrote:
> >
> >
> >
> >On Mar 27, 2013, at 6:54pm, kiran chitturi wrote:
> >
> >
> >Thank you Binoy for reporting.
> >
> >
> >We have been monitoring the pages and deleting them when we get time but
> >there are more coming up. Today, I have seen a spam editing on the home
> >page of Nutch wiki. It has inserted spam links under tutorials.
> >
> >
> >We need to find a permanent solution to this. I wonder if any other
> >list-servs are facing the same issue.
> >
> >
> >
> >
> >
> >
> >Yes - Solr recently had to lock down editing on their wiki:
> >
> >
> >
> >The wiki at http://wiki.apache.org/solr/ has come under attack by
> >spammers more frequently of late, so the PMC has decided to lock it down
> > in an attempt to reduce the work involved in tracking and removing spam.
> >
> >From now on, only people who appear on
> >http://wiki.apache.org/solr/ContributorsGroup will be able to
> >create/modify/delete wiki pages.
> >
> >Please request either on the solr-u...@lucene.apache.org or on
> >d...@lucene.apache.org to have your wiki username added to the
> >ContributorsGroup
> > page - this is a one-time step.
> >
> >
> >
> >
> >So I think you need to make a request to Infra to lock down the wiki,
> >then add people (generally in response to explicit requests) to the
> >ContributorsGroup page.
> >
> >
> >-- Ken
> >
> >
> >
> >
> >
> >
> >On Thu, Mar 28, 2013 at 12:49 AM, Binoy d
> > wrote:
> >
> >I am quite suprised looking at the notification I am getting for new
> >pages for Nutch Wiki
> >Example :
> >http://wiki.apache.org/nutch/KarlPuent
> >
> >I see at least 25-35 emails regarding such notification.
> >
> >All of the links I got are  rooted under
> >http://wiki.apache.org/nutch/ 
> >
> >
> >Is some one looking into this , If needed I can gladly forward emails to
> >the person cleaning it up as I am not sure if every one has access to
> >delete the pages.
> >
> >Regards,
> >b
> >
> >-- Forwarded message --
> >From: Apache Wiki 
> >Date: Wed, Mar 27,

[jira] [Commented] (NUTCH-1501) Harmonize behavior of parsechecker and indexchecker

2013-03-28 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13616572#comment-13616572
 ] 

Lewis John McGibbney commented on NUTCH-1501:
-

+1 Seb

> Harmonize behavior of parsechecker and indexchecker
> ---
>
> Key: NUTCH-1501
> URL: https://issues.apache.org/jira/browse/NUTCH-1501
> Project: Nutch
>  Issue Type: Improvement
>  Components: indexer, parser
>Reporter: Sebastian Nagel
>Assignee: Lewis John McGibbney
>Priority: Minor
> Fix For: 2.2
>
> Attachments: NUTCH-1501-2.x.patch, NUTCH-1501-2.x-v2.patch, 
> NUTCH-1501-trunk.patch, NUTCH-1501-trunk-v2.patch
>
>
> Behaviour of ParserChecker and IndexingFiltersChecker has diverged between 
> trunk and 2.x
> - missing in 2.x: NUTCH-1320, NUTCH-1207
> - open issue to be also applied to 2.x: NUTCH-1419, NUTCH-1389

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Important : Bunch of Spam Created under Nutch Wiki!!

2013-03-28 Thread Mattmann, Chris A (388J)
Hi Kiran, 

Yes, my recommendation:

1. Jump into #asfinfra on freeonode, find Joe, or Gavin or Daniel,
ask for help. If you don't have IRC, email infrastruct...@apache.org
and/or file a https://issues.apache.org/jira/browse/INFRA ticket

2. Request that they enable ASAP ContributorsGroup only acls

I know that many Apache wikis (MoinMon) are being attackedŠ

Cheers,
Chris


++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++




-Original Message-
From: kiran chitturi 
Reply-To: "dev@nutch.apache.org" 
Date: Thursday, March 28, 2013 12:15 PM
To: "dev@nutch.apache.org" 
Subject: Fwd: Important : Bunch of Spam Created under Nutch Wiki!!

>Thanks to Ken (check message below) for reporting our insecure wiki. I
>have checked it and anyone can create an fake account and edit any of our
>wiki pages or create new ones.
>
>
>When I first registered to the wiki, all the pages are immutable and
>Lewis had to add me to Contributors group to make changes to the wiki.
>
>
>Probably, the setting was hacked for now and that is the reason we are
>facing lot of spam.
>
>
>Can we contact the infra@apache and request them to lock down the wiki as
>the other groups did ?
>
>
>
>
>-- Forwarded message --
>From: Ken Krugler 
>Date: Thu, Mar 28, 2013 at 1:35 PM
>Subject: Re: Important : Bunch of Spam Created under Nutch Wiki!!
>To: dev@nutch.apache.org
>
>
>Hi Kiran,
>
>On Mar 28, 2013, at 2:03am, kiran chitturi wrote:
>
>
>Thank you Ken for the information. I think the access is already
>restricted to Contributors Only. Someone can please confirm, if it is
>not. 
>
>
>
>
>
>It's not, as far as I know. I just created a fake account, logged in with
>it, and edited the front page.
>
>
>If anyone needs to edit wiki, they would need to ask someone to get
>access to wiki pages.
>
>
>Do you know if Solr still got hit by spam after locking down the wiki ?
>
>
>
>
>
>
>I think that change helped cut down most of the spam, but I don't monitor
>the Solr list that closely, sorry.
>
>
>-- Ken
>
>
>
>
>
>
>On Thu, Mar 28, 2013 at 1:40 AM, Ken Krugler
> wrote:
>
>
>
>On Mar 27, 2013, at 6:54pm, kiran chitturi wrote:
>
>
>Thank you Binoy for reporting.
>
>
>We have been monitoring the pages and deleting them when we get time but
>there are more coming up. Today, I have seen a spam editing on the home
>page of Nutch wiki. It has inserted spam links under tutorials.
>
>
>We need to find a permanent solution to this. I wonder if any other
>list-servs are facing the same issue.
>
>
>
>
>
>
>Yes - Solr recently had to lock down editing on their wiki:
>
>
>
>The wiki at http://wiki.apache.org/solr/ has come under attack by
>spammers more frequently of late, so the PMC has decided to lock it down
> in an attempt to reduce the work involved in tracking and removing spam.
>
>From now on, only people who appear on
>http://wiki.apache.org/solr/ContributorsGroup will be able to
>create/modify/delete wiki pages.
>
>Please request either on the solr-u...@lucene.apache.org or on
>d...@lucene.apache.org to have your wiki username added to the
>ContributorsGroup
> page - this is a one-time step.
>
>
>
>
>So I think you need to make a request to Infra to lock down the wiki,
>then add people (generally in response to explicit requests) to the
>ContributorsGroup page.
>
>
>-- Ken
>
>
>
>
>
>
>On Thu, Mar 28, 2013 at 12:49 AM, Binoy d
> wrote:
>
>I am quite suprised looking at the notification I am getting for new
>pages for Nutch Wiki
>Example :
>http://wiki.apache.org/nutch/KarlPuent
>
>I see at least 25-35 emails regarding such notification.
>
>All of the links I got are  rooted under
>http://wiki.apache.org/nutch/ 
>
>
>Is some one looking into this , If needed I can gladly forward emails to
>the person cleaning it up as I am not sure if every one has access to
>delete the pages.
>
>Regards,
>b
>
>-- Forwarded message --
>From: Apache Wiki 
>Date: Wed, Mar 27, 2013 at 9:32 PM
>Subject: [Nutch Wiki] Trivial Update of "EdwinaBro" by EdwinaBro
>To: Apache Wiki 
>
>
>Dear Wiki user,
>
>You have subscribed to a wiki page or wiki category on "Nutch Wiki" for
>change notification.
>
>The "EdwinaBro" page has been changed by EdwinaBro:
>http://wiki.apache.org/nutch/EdwinaBro
>
>New page:
>I am 24 years old and my name is Edwina Brownlee. I life in Corjolens
>(Switzerland).<>
><>
><>
>Take a look at my web-site ... [[http://modform.org/SolomonKr|Continue
>]]
>
>
>
>
>
>
>
>
>
>-- 
>Kiran Chi

Fwd: Important : Bunch of Spam Created under Nutch Wiki!!

2013-03-28 Thread kiran chitturi
Thanks to Ken (check message below) for reporting our insecure wiki. I have
checked it and anyone can create an fake account and edit any of our wiki
pages or create new ones.

When I first registered to the wiki, all the pages are immutable and Lewis
had to add me to Contributors group to make changes to the wiki.

Probably, the setting was hacked for now and that is the reason we are
facing lot of spam.

Can we contact the infra@apache and request them to lock down the wiki as
the other groups did ?


-- Forwarded message --
From: Ken Krugler 
Date: Thu, Mar 28, 2013 at 1:35 PM
Subject: Re: Important : Bunch of Spam Created under Nutch Wiki!!
To: dev@nutch.apache.org


Hi Kiran,

On Mar 28, 2013, at 2:03am, kiran chitturi wrote:

Thank you Ken for the information. I think the access is already restricted
to Contributors Only. Someone can please confirm, if it is not.


It's not, as far as I know. I just created a fake account, logged in with
it, and edited the front page.

If anyone needs to edit wiki, they would need to ask someone to get access
to wiki pages.

Do you know if Solr still got hit by spam after locking down the wiki ?


I think that change helped cut down most of the spam, but I don't monitor
the Solr list that closely, sorry.

-- Ken



On Thu, Mar 28, 2013 at 1:40 AM, Ken Krugler wrote:

>
> On Mar 27, 2013, at 6:54pm, kiran chitturi wrote:
>
> Thank you Binoy for reporting.
>
> We have been monitoring the pages and deleting them when we get time but
> there are more coming up. Today, I have seen a spam editing on the home
> page of Nutch wiki. It has inserted spam links under tutorials.
>
> We need to find a permanent solution to this. I wonder if any other
> list-servs are facing the same issue.
>
>
> Yes - Solr recently had to lock down editing on their wiki:
>
> The wiki at http://wiki.apache.org/solr/ has come under attack by
> spammers more frequently of late, so the PMC has decided to lock it down in
> an attempt to reduce the work involved in tracking and removing spam.
>
> From now on, only people who appear on
> http://wiki.apache.org/solr/ContributorsGroup will be able to
> create/modify/delete wiki pages.
>
> Please request either on the solr-u...@lucene.apache.org or on
> d...@lucene.apache.org to have your wiki username added to the
> ContributorsGroup page - this is a one-time step.
>
>
> So I think you need to make a request to Infra to lock down the wiki, then
> add people (generally in response to explicit requests) to the
> ContributorsGroup page.
>
> -- Ken
>
>
>
>
> On Thu, Mar 28, 2013 at 12:49 AM, Binoy d  wrote:
>
>> I am quite suprised looking at the notification I am getting for new
>> pages for Nutch Wiki
>> Example :
>> http://wiki.apache.org/nutch/KarlPuent
>>
>> I see at least 25-35 emails regarding such notification.
>>
>> All of the links I got are  rooted under http://wiki.apache.org/nutch/
>>
>>
>> Is some one looking into this , If needed I can gladly forward emails to
>> the person cleaning it up as I am not sure if every one has access to
>> delete the pages.
>>
>> Regards,
>> b
>>
>> -- Forwarded message --
>> From: Apache Wiki 
>> Date: Wed, Mar 27, 2013 at 9:32 PM
>> Subject: [Nutch Wiki] Trivial Update of "EdwinaBro" by EdwinaBro
>> To: Apache Wiki 
>>
>>
>> Dear Wiki user,
>>
>> You have subscribed to a wiki page or wiki category on "Nutch Wiki" for
>> change notification.
>>
>> The "EdwinaBro" page has been changed by EdwinaBro:
>> http://wiki.apache.org/nutch/EdwinaBro
>>
>> New page:
>> I am 24 years old and my name is Edwina Brownlee. I life in Corjolens
>> (Switzerland).<>
>> <>
>> <>
>> Take a look at my web-site ... [[http://modform.org/SolomonKr|Continue]]
>>
>>
>
>
> --
> Kiran Chitturi
>
> 
>
>
>
>--
>  Ken Krugler
> +1 530-210-6378
> http://www.scaleunlimited.com
> custom big data solutions & training
> Hadoop, Cascading, Cassandra & Solr
>
>
>
>
>
>


-- 
Kiran Chitturi





   --
 Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr








-- 
Kiran Chitturi




[Nutch Wiki] Trivial Update of "AugustGla" by AugustGla

2013-03-28 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "AugustGla" page has been changed by AugustGla:
http://wiki.apache.org/nutch/AugustGla

New page:
Shut friends call him Rupert McLendon but he doesn't like persons use his name. 
One of the things that he or she loves most will be always to play golf but 
nevertheless , he's been juggling new things just recently.<>
<>
For years he's been working as being a manager. Oregon was his birth set up. 
I've been working on my website for some a chance now. Check it out here: 
http://www.websitelinkadvertising.com/News/1000-dream-bungalow-restaurant-indonesia-mega-travel/<>
<>
Feel free to surf to my homepage: 
[[http://www.linksdelicious.com/story.php?title=airasia-big-global-loyalty-programme-is-now-free-indonesia-mega-travel|The
 Beverly Hills Bali Hotel]]


Re: Important : Bunch of Spam Created under Nutch Wiki!!

2013-03-28 Thread Ken Krugler
Hi Kiran,

On Mar 28, 2013, at 2:03am, kiran chitturi wrote:

> Thank you Ken for the information. I think the access is already restricted 
> to Contributors Only. Someone can please confirm, if it is not. 

It's not, as far as I know. I just created a fake account, logged in with it, 
and edited the front page.

> If anyone needs to edit wiki, they would need to ask someone to get access to 
> wiki pages. 
> 
> Do you know if Solr still got hit by spam after locking down the wiki ?

I think that change helped cut down most of the spam, but I don't monitor the 
Solr list that closely, sorry.

-- Ken



> On Thu, Mar 28, 2013 at 1:40 AM, Ken Krugler  
> wrote:
> 
> On Mar 27, 2013, at 6:54pm, kiran chitturi wrote:
> 
>> Thank you Binoy for reporting.
>> 
>> We have been monitoring the pages and deleting them when we get time but 
>> there are more coming up. Today, I have seen a spam editing on the home page 
>> of Nutch wiki. It has inserted spam links under tutorials.
>> 
>> We need to find a permanent solution to this. I wonder if any other 
>> list-servs are facing the same issue.
> 
> Yes - Solr recently had to lock down editing on their wiki:
> 
>> The wiki at http://wiki.apache.org/solr/ has come under attack by spammers 
>> more frequently of late, so the PMC has decided to lock it down in an 
>> attempt to reduce the work involved in tracking and removing spam.
>> 
>> From now on, only people who appear on 
>> http://wiki.apache.org/solr/ContributorsGroup will be able to 
>> create/modify/delete wiki pages.
>> 
>> Please request either on the solr-u...@lucene.apache.org or on 
>> d...@lucene.apache.org to have your wiki username added to the 
>> ContributorsGroup page - this is a one-time step.
> 
> So I think you need to make a request to Infra to lock down the wiki, then 
> add people (generally in response to explicit requests) to the 
> ContributorsGroup page.
> 
> -- Ken
> 
> 
>> 
>> 
>> On Thu, Mar 28, 2013 at 12:49 AM, Binoy d  wrote:
>> I am quite suprised looking at the notification I am getting for new pages 
>> for Nutch Wiki
>> Example :
>> http://wiki.apache.org/nutch/KarlPuent
>> 
>> I see at least 25-35 emails regarding such notification.
>> 
>> All of the links I got are  rooted under http://wiki.apache.org/nutch/
>> 
>> 
>> Is some one looking into this , If needed I can gladly forward emails to the 
>> person cleaning it up as I am not sure if every one has access to delete the 
>> pages.
>> 
>> Regards,
>> b
>> 
>> -- Forwarded message --
>> From: Apache Wiki 
>> Date: Wed, Mar 27, 2013 at 9:32 PM
>> Subject: [Nutch Wiki] Trivial Update of "EdwinaBro" by EdwinaBro
>> To: Apache Wiki 
>> 
>> 
>> Dear Wiki user,
>> 
>> You have subscribed to a wiki page or wiki category on "Nutch Wiki" for 
>> change notification.
>> 
>> The "EdwinaBro" page has been changed by EdwinaBro:
>> http://wiki.apache.org/nutch/EdwinaBro
>> 
>> New page:
>> I am 24 years old and my name is Edwina Brownlee. I life in Corjolens 
>> (Switzerland).<>
>> <>
>> <>
>> Take a look at my web-site ... [[http://modform.org/SolomonKr|Continue]]
>> 
>> 
>> 
>> 
>> -- 
>> Kiran Chitturi
>> 
>> 
>> 
>> 
> 
> --
> Ken Krugler
> +1 530-210-6378
> http://www.scaleunlimited.com
> custom big data solutions & training
> Hadoop, Cascading, Cassandra & Solr
> 
> 
> 
> 
> 
> 
> 
> 
> -- 
> Kiran Chitturi
> 
> 
> 
> 

--
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr







[Nutch Wiki] Trivial Update of "PatsyWeir" by PatsyWeir

2013-03-28 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "PatsyWeir" page has been changed by PatsyWeir:
http://wiki.apache.org/nutch/PatsyWeir

New page:
Hello there. Let me start off by introducing the creator, his title is Faustino 
but folks constantly misspell it.<>
Carrying out 3d graphics is anything he seriously enjoys executing. Soon after 
getting out of his career for decades he became a cashier but he is constantly 
needed his personal organization. Arizona is the only spot he's been residing 
in and his mother and father live close by.<>
<>
Also visit my website :: 
[[http://www.tvinstallationsandiego.com/story.php?title=cebu-pacific-airbus-a320-philippine-tvc|Suggested
 Internet site]]


[Nutch Wiki] Update of "FrontPage" by BogusUser

2013-03-28 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "FrontPage" page has been changed by BogusUser:
http://wiki.apache.org/nutch/FrontPage?action=diff&rev1=266&rev2=267

   * [[Presentations]] on Nutch
   * Press [[Articles]]
   * [[Evaluations]] of Search Quality
-  * Commercial [[Support]] and developers for hire
+  * Commercial [[Support]] & developers for hire
   * [[Mailing]] Lists
   * AcademicArticles that deal with Nutch
   * [[FAQ]]


[Nutch Wiki] Trivial Update of "KimberlyB" by KimberlyB

2013-03-28 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "KimberlyB" page has been changed by KimberlyB:
http://wiki.apache.org/nutch/KimberlyB

New page:
Wonderful to meet up with you, my title is Glory Farlow but you can call me 
something you like.<>
My close friends say it really is not great for me but what I enjoy 
accomplishing is baseball and I would in no way give it up. New York is the 
place I enjoy most but now I'm thinking of other alternatives. Because I was 
eighteen I've been performing as a payroll clerk and it can be some thing I 
genuinely get pleasure from.<>
If you want to obtain out a lot more verify out my website: 
http://www.baskettnrequinshop.com


[Nutch Wiki] Trivial Update of "Lukas36U" by Lukas36U

2013-03-28 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "Lukas36U" page has been changed by Lukas36U:
http://wiki.apache.org/nutch/Lukas36U

New page:
Not much to tell about me really.<>
Finally a part of this community.<>
I just wish Im useful in one way here.<>
<>
Check out my weblog - 
[[http://www.bluewhitestudio.com/modules.php?name=Your_Account&op=userinfo&username=BerndNort|click
 the up coming website page]]


[Nutch Wiki] Trivial Update of "ShellaSha" by ShellaSha

2013-03-28 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "ShellaSha" page has been changed by ShellaSha:
http://wiki.apache.org/nutch/ShellaSha

New page:
Nothing to say about me really.<>
I enjoy of finally being a member of apache.org.<>
<>
I really wish I am useful in one way here.<>
<>
Here is my web page - 
[[http://www.comparevouchercodes.com/uk/kiddicare.com|Kiddicare.com promo code]]


[Nutch Wiki] Trivial Update of "Darrell86" by Darrell86

2013-03-28 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "Darrell86" page has been changed by Darrell86:
http://wiki.apache.org/nutch/Darrell86

New page:
Pat Yamamoto is what people call me although it is not the name on my 
childbirth certificate.<>
I presently stay in Alaska but my wife wants us to move. Data processing has 
actually been my career for time. As a man what I truly such as is jetski but I 
cannot make it my occupation really. I am running and maintaining a blog site 
here: http://wiki.zart.org/index.<>
php?title=SAA_Vacations_Introduces_Exceptional_12-Day_Group_Luxury_Packages_From__Dollar_4999-plus_taxes<>
<>
Also visit my weblog ... 
[[http://spreadshub.com/index.php?title=Award_Winning_Nature_Photographer_Michael_Lorentz_of_Passage_to_Africa_Safaris_Shares_his_2012_Safari_Olympics_Medal_Winners|click
 through the following website]]


[Nutch Wiki] Trivial Update of "WKEShosha" by WKEShosha

2013-03-28 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "WKEShosha" page has been changed by WKEShosha:
http://wiki.apache.org/nutch/WKEShosha

New page:
Howdy. Let me start by telling the author's name 3 ) Alonso and he totally 
loves the foregoing name. Data doling is what he does but soon a wife and him 
will start extremely own business. As one man what he really loves is watching 
movie pictures and he is trying to make it then a profession. Regarding the 
he's been in North Carolina.<>
<>
Objective, i'm not good at web design but you may want to check my website: 
http://www.orbitcrazy.com/story.php?title=moving-companies-denver-co-303-835-0415-%7C-denver-movers-%7C-303-835-0415<>
<>
my webpage ... 
[[http://www.tubeflier.com/story.php?title=denver-movers-making-moving-trouble-free|related
 webpage]]


[Nutch Wiki] Trivial Update of "ColinNeff" by ColinNeff

2013-03-28 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "ColinNeff" page has been changed by ColinNeff:
http://wiki.apache.org/nutch/ColinNeff

New page:
Fmacademy - der Dienstleister für Ausbildung und berufliche Weiterbildung in 
der Nähe des Kurfürstendamms im Herzstück der Hauptstadt von Deutschland.<>
Wir bieten Ausbildungen zum Industriemeister für Elektrotechnik / 
Industriemeister für Metall, bieten Ausbildungen in den Bereichen Veranstaltung 
und Tourismus, Bürokommunikation und Berufspädagoge aus. Investieren Sie in 
Ihre Zukunft und qualifizieren Sie sich weiter oder erlernen einen nagelneuen 
Beruf.<>
Wir unterstützen Sie außerdem nach der Ausbildung, einen Vollzeit-Job zu 
finden. Wir sind gespannt auf Ihren Telefonanruf oder Ihre E-Mail-Nachricht. 
Auf unserer WWW-Seite finden Sie die neuen Bildungsangebote und 
selbstverständlich die Kontaktdaten. Bis dann in unseren Räumlichkeiten.<>
<>
<>
Also visit my blog post :: [[http://www.fmacademy.de|Hauptseite]]


[Nutch Wiki] Trivial Update of "Brigette0" by Brigette0

2013-03-28 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "Brigette0" page has been changed by Brigette0:
http://wiki.apache.org/nutch/Brigette0

New page:
Title of the writer is Lita. What her as well as her love is undoubtedly bee 
keeping while she's been absorbing new things presently. Massachusetts is the 
only place she's currently residing in and also her family would like it.<>
After being out of his job for several years he became a dispatcher and your 
girl will be promoted before i write again. See what's new on my website here: 
http://asianlounge.com/blogs/entry/Planning-on-moving-to-LA<>
<>
Here is my blog post 
[[http://heliumbeta.com/story.php?title=planning-on-moving-to-la|Anaheim CA 
cost of movers]]


[Nutch Wiki] Trivial Update of "ErikDayto" by ErikDayto

2013-03-28 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "ErikDayto" page has been changed by ErikDayto:
http://wiki.apache.org/nutch/ErikDayto

New page:
My name is Erik Dayton. I life in Reichenbach-Steegen (Germany).<>
<>
<>
Here is my web blog :: 
[[http://www.musclerevxtremereviews.org/|www.musclerevxtremereviews.org]]


[Nutch Wiki] Trivial Update of "Hairstyle_to_have_get-togethers_and_home" by BlytheSQQ

2013-03-28 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "Hairstyle_to_have_get-togethers_and_home" page has been changed by 
BlytheSQQ:
http://wiki.apache.org/nutch/Hairstyle_to_have_get-togethers_and_home

New page:
Visiting the walkway is amongst the most very special time in a very very 
junior person's life span, which of course is a conference they most likely 
will keep in mind forever. Little girls especially find it imperative, perhaps 
due to the fact which these be able to place charming evening wear, 
tastefulness and feel and look elegant. Due to this this piece of article 
struggles to assist you females with guidance about the way to make certain 
that their other prom evening will certainly be wonderful and nothing appalling 
will occur. Now we will specialize here on promenade hairstyling trend quite a 
while hair color, because they are more rapidly to arrive at and look after all 
the way through the dark.<>
<>
You must have a lifelong, or convenient to problem solved trend for grand 
events like the [[http://www.hairstylesfitter.com|hairstyle]] walk or a wedding 
event, merely because keep doing this for many hours, there's truly shows, 
mingling, and all shapes and sizes of stroke that is able to impact on a girl's 
appearance. As a result, your location class dance hairdos for very long hair 
color need to be decided on cautiously, on par with see your face, the feel of 
a persons leg hair, your wedding dress and sometimes even your desired 
behavior. Even though answer may be to operate a large amount of pendant and 
hairspray in order to repair an updo, it is also wise to employ a form that 
doesn't require as great energy, looks an increased and commit to be superb 
completely.<>
<>
Most girls - and even Merseyside figures - prefer to place their other mane 
easily more recently, with just a side chunk plus some giant, forfit secure. 
This theme looks nice, kempt and classy, but you really do need theme swindle 
good food so that you can perform a finished look. Ultimately laundry nice 
hair, dry out it, begin using quite a number defending guide, and next put upon 
the greatest curlers you select from. When the mane is become dry, take the 
curlers off and smoothly scour the tresses utilizing your fingers. Drag it on 
one border of persons skin, thereafter hold it with an ornate grip and you need 
started perhaps the plainest wander haircut quite a while hair.<>
<>
Next one of most saunter design for extended tresses i e sought after now could 
be the fishtail ponytail. Because of technique, as much as you would benefit to 
educate yourself on the best way to effectuate this braid, and after which 
elect it doesn't matter if you wish to it much more or looser. You are also 
able to finalize it by including a little bit of imitation crops or shape 
hairpins. In able for hairstlye free within this full natural splendor, it is 
possible you can extend it, or increase damaged join mane to border that 
person. In all respects, remind yourself that it needs to into a haircut you 
have enough confidence wear, and your guaranteed to look lovely.


[Nutch Wiki] Trivial Update of "BlytheSQQ" by BlytheSQQ

2013-03-28 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "BlytheSQQ" page has been changed by BlytheSQQ:
http://wiki.apache.org/nutch/BlytheSQQ

New page:
Not much to write about myself at all.<>
Finally a member of this site.<>
I really hope I'm useful in some way here.<>
<>
Here is my website; [[http://www.hairstylesfitter.com|Hairstyles]]


[Nutch Wiki] Trivial Update of "AnneGarbe" by AnneGarbe

2013-03-28 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "AnneGarbe" page has been changed by AnneGarbe:
http://wiki.apache.org/nutch/AnneGarbe

New page:
I've would like to introduce me to you, I am Teodoro Gleason. What me and my 
best family love is lacemaking though I haven't made a cent with it.<>
My family lives all the way through Oregon. Since I was eighteen I've been 
working as a new good office clerk. See what's new on my net page here: 
http://www.mapleforest.co.uk


[Nutch Wiki] Trivial Update of "RamiroIoi" by RamiroIoi

2013-03-28 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "RamiroIoi" page has been changed by RamiroIoi:
http://wiki.apache.org/nutch/RamiroIoi

New page:
Even extremely little newborns and small children was carrying bonnets.<>
They are able to have exclusive which implies, for example spiritual or 
cultural.<>
<>
My web page ... [[http://business.blinkweb.com/editor.html|cheap snapbacks]]


[Nutch Wiki] Trivial Update of "ElissaGaf" by ElissaGaf

2013-03-28 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "ElissaGaf" page has been changed by ElissaGaf:
http://wiki.apache.org/nutch/ElissaGaf

New page:
Creator is called Curtis Corcoran. He will be fond of cats but he lacks the 
time lately.<>
<>
Auditing has been his day job for some time after but his promotion never 
comes. The mans house is of course in Oklahoma. Certainly be a realistic good 
at web design but you may want to check my website: 
http://www.back2link.com/story.php?title=spirals-boracay-hotel-boracay-mega-travel<>
<>
my blog post ... 
[[http://canfriends.com/blogs/21054/31627/snorkeling-at-crocodile-island-b|just 
click the following document]]


[Nutch Wiki] Trivial Update of "Xiomara23" by Xiomara23

2013-03-28 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "Xiomara23" page has been changed by Xiomara23:
http://wiki.apache.org/nutch/Xiomara23

New page:
It is a straightforward to wash appliance because of its rounded housing plus a 
dough capacity of eleven cups, that is considered a huge job and allows 
alternate speeds for mixing.<>
One of the very best cheap food processors tested for that price was this model 
designed for $24. <>
<>
My name: Xiomara Acevedo<>
Age: 23<>
Country: United States<>
Home town: Salt Lake City <>
ZIP: 84119<>
Address: 788 Buck Drive<>
<>
My web page ... 
[[http://web.hsunity.com/index.php?do=/profile-8722/info/|web.hsunity.com]]


[jira] [Commented] (NUTCH-1538) tuning of loaded fields during fetcherJob start-up

2013-03-28 Thread lufeng (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13616250#comment-13616250
 ] 

lufeng commented on NUTCH-1538:
---

yes, However, we can not guarantee that other plugin that extended by user will 
be use to the corresponding field values​​ in WebPage class. 

> tuning of loaded fields during fetcherJob start-up
> --
>
> Key: NUTCH-1538
> URL: https://issues.apache.org/jira/browse/NUTCH-1538
> Project: Nutch
>  Issue Type: Improvement
>  Components: fetcher
>Affects Versions: 2.1
> Environment: nutch 2.1 / cassandra 1.2.1 / gora-cassandra 0.2 / 
> gora-core 0.2.1 
> running fetch with parse=true
>Reporter: Roland von Herget
> Attachments: NUTCH-1538-FetcherJob-v1.patch
>
>
> Main problem is, nutch is loading nearly every row & column from DB during 
> startup of a fetcherJob when fetcher.parse=true.
> A parserJob needs e.g. the CONTENT field from db, to parse.
> The fetcherJob adds all fields of the parserJob to it's needed fields, if 
> running with fetcher.parse=true. [FetcherJob.getFields()]
> If the nutch configuration saves all fetched data to DB 
> (fetcher.store.content=true) you'll end up loading GBs of unused content 
> during fetcherJob start-up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (NUTCH-1547) BasicIndexingFilter - Problem to index full title

2013-03-28 Thread lufeng (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13616227#comment-13616227
 ] 

lufeng commented on NUTCH-1547:
---

Feng Committed revision 1462078 to trunk and 2.x revision 1462079.


> BasicIndexingFilter - Problem to index full title
> -
>
> Key: NUTCH-1547
> URL: https://issues.apache.org/jira/browse/NUTCH-1547
> Project: Nutch
>  Issue Type: Bug
>  Components: indexer
>Affects Versions: 1.6, 2.1
>Reporter: Gustavo Rauber
>Assignee: lufeng
>Priority: Minor
> Fix For: 1.7, 2.2
>
> Attachments: NUTCH-1547-2x.patch, NUTCH-1547.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> I have faced this issue when trying to index the entire title, just like the 
> content, configuring its value on nutch-default.xml to -1 
> (indexer.max.title.length). I think the behavior should be the same as the 
> content.
> If you would like to fix it, just replace the line number 90:
> if (title.length() > MAX_TITLE_LENGTH) {  // truncate title if needed
> by this one:
> if (MAX_TITLE_LENGTH > -1 && title.length() > MAX_TITLE_LENGTH) {  // 
> truncate title if needed
> Stack Trace:
> java.lang.StringIndexOutOfBoundsException: String index out of range: -1
>   at java.lang.String.substring(String.java:1937)
>   at 
> org.apache.nutch.indexer.basic.BasicIndexingFilter.filter(BasicIndexingFilter.java:91)
>   at 
> org.apache.nutch.indexer.IndexingFilters.filter(IndexingFilters.java:109)
>   at 
> org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:272)
>   at 
> org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:53)
>   at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:519)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
> Cheers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (NUTCH-1547) BasicIndexingFilter - Problem to index full title

2013-03-28 Thread lufeng (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lufeng resolved NUTCH-1547.
---

Resolution: Fixed

> BasicIndexingFilter - Problem to index full title
> -
>
> Key: NUTCH-1547
> URL: https://issues.apache.org/jira/browse/NUTCH-1547
> Project: Nutch
>  Issue Type: Bug
>  Components: indexer
>Affects Versions: 1.6, 2.1
>Reporter: Gustavo Rauber
>Assignee: lufeng
>Priority: Minor
> Fix For: 1.7, 2.2
>
> Attachments: NUTCH-1547-2x.patch, NUTCH-1547.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> I have faced this issue when trying to index the entire title, just like the 
> content, configuring its value on nutch-default.xml to -1 
> (indexer.max.title.length). I think the behavior should be the same as the 
> content.
> If you would like to fix it, just replace the line number 90:
> if (title.length() > MAX_TITLE_LENGTH) {  // truncate title if needed
> by this one:
> if (MAX_TITLE_LENGTH > -1 && title.length() > MAX_TITLE_LENGTH) {  // 
> truncate title if needed
> Stack Trace:
> java.lang.StringIndexOutOfBoundsException: String index out of range: -1
>   at java.lang.String.substring(String.java:1937)
>   at 
> org.apache.nutch.indexer.basic.BasicIndexingFilter.filter(BasicIndexingFilter.java:91)
>   at 
> org.apache.nutch.indexer.IndexingFilters.filter(IndexingFilters.java:109)
>   at 
> org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:272)
>   at 
> org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:53)
>   at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:519)
>   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
>   at 
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
> Cheers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[Nutch Wiki] Trivial Update of "RosarioGi" by RosarioGi

2013-03-28 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "RosarioGi" page has been changed by RosarioGi:
http://wiki.apache.org/nutch/RosarioGi

New page:
They call the author Gigi. Some time back she selected to dwell in Texas.<>
Bookkeeping is where her primary revenue will come from and she will not modify 
it at any time shortly. To engage in domino is what she does just about every 
week.<>
<>
Stop by my page [[http://absolutelogo.com|related internet page]]


Re: Important : Bunch of Spam Created under Nutch Wiki!!

2013-03-28 Thread kiran chitturi
Thank you Ken for the information. I think the access is already restricted
to Contributors Only. Someone can please confirm, if it is not.

If anyone needs to edit wiki, they would need to ask someone to get access
to wiki pages.

Do you know if Solr still got hit by spam after locking down the wiki ?

Thanks,
Th


On Thu, Mar 28, 2013 at 1:40 AM, Ken Krugler wrote:

>
> On Mar 27, 2013, at 6:54pm, kiran chitturi wrote:
>
> Thank you Binoy for reporting.
>
> We have been monitoring the pages and deleting them when we get time but
> there are more coming up. Today, I have seen a spam editing on the home
> page of Nutch wiki. It has inserted spam links under tutorials.
>
> We need to find a permanent solution to this. I wonder if any other
> list-servs are facing the same issue.
>
>
> Yes - Solr recently had to lock down editing on their wiki:
>
> The wiki at http://wiki.apache.org/solr/ has come under attack by
> spammers more frequently of late, so the PMC has decided to lock it down in
> an attempt to reduce the work involved in tracking and removing spam.
>
> From now on, only people who appear on
> http://wiki.apache.org/solr/ContributorsGroup will be able to
> create/modify/delete wiki pages.
>
> Please request either on the solr-u...@lucene.apache.org or on
> d...@lucene.apache.org to have your wiki username added to the
> ContributorsGroup page - this is a one-time step.
>
>
> So I think you need to make a request to Infra to lock down the wiki, then
> add people (generally in response to explicit requests) to the
> ContributorsGroup page.
>
> -- Ken
>
>
>
>
> On Thu, Mar 28, 2013 at 12:49 AM, Binoy d  wrote:
>
>> I am quite suprised looking at the notification I am getting for new
>> pages for Nutch Wiki
>> Example :
>> http://wiki.apache.org/nutch/KarlPuent
>>
>> I see at least 25-35 emails regarding such notification.
>>
>> All of the links I got are  rooted under http://wiki.apache.org/nutch/
>>
>>
>> Is some one looking into this , If needed I can gladly forward emails to
>> the person cleaning it up as I am not sure if every one has access to
>> delete the pages.
>>
>> Regards,
>> b
>>
>> -- Forwarded message --
>> From: Apache Wiki 
>> Date: Wed, Mar 27, 2013 at 9:32 PM
>> Subject: [Nutch Wiki] Trivial Update of "EdwinaBro" by EdwinaBro
>> To: Apache Wiki 
>>
>>
>> Dear Wiki user,
>>
>> You have subscribed to a wiki page or wiki category on "Nutch Wiki" for
>> change notification.
>>
>> The "EdwinaBro" page has been changed by EdwinaBro:
>> http://wiki.apache.org/nutch/EdwinaBro
>>
>> New page:
>> I am 24 years old and my name is Edwina Brownlee. I life in Corjolens
>> (Switzerland).<>
>> <>
>> <>
>> Take a look at my web-site ... [[http://modform.org/SolomonKr|Continue]]
>>
>>
>
>
> --
> Kiran Chitturi
>
> 
>
>
>
>--
>  Ken Krugler
> +1 530-210-6378
> http://www.scaleunlimited.com
> custom big data solutions & training
> Hadoop, Cascading, Cassandra & Solr
>
>
>
>
>
>


-- 
Kiran Chitturi




[Nutch Wiki] Trivial Update of "beiibinoux2013" by beiibinoux2013

2013-03-28 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "beiibinoux2013" page has been changed by beiibinoux2013:
http://wiki.apache.org/nutch/beiibinoux2013

New page:
Mon nom est Matthias Alves. J'habite à Loicheckgegend (Austria).<>
<>
Here is my blog :: 
[[http://aristotle.oneonta.edu/35_milne_library_news/archive/746_artwork_by_jian_cui_professor_suny_oneonta_art_department.html|aristotle.oneonta.edu]]


[jira] [Updated] (NUTCH-1538) tuning of loaded fields during fetcherJob start-up

2013-03-28 Thread Roland von Herget (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roland von Herget updated NUTCH-1538:
-

Attachment: NUTCH-1538-FetcherJob-v1.patch

for branches/2.x - #1461969
- remove loading of ParserJob fields during a FetcherJob with parse=true
- 'works for me'


> tuning of loaded fields during fetcherJob start-up
> --
>
> Key: NUTCH-1538
> URL: https://issues.apache.org/jira/browse/NUTCH-1538
> Project: Nutch
>  Issue Type: Improvement
>  Components: fetcher
>Affects Versions: 2.1
> Environment: nutch 2.1 / cassandra 1.2.1 / gora-cassandra 0.2 / 
> gora-core 0.2.1 
> running fetch with parse=true
>Reporter: Roland von Herget
> Attachments: NUTCH-1538-FetcherJob-v1.patch
>
>
> Main problem is, nutch is loading nearly every row & column from DB during 
> startup of a fetcherJob when fetcher.parse=true.
> A parserJob needs e.g. the CONTENT field from db, to parse.
> The fetcherJob adds all fields of the parserJob to it's needed fields, if 
> running with fetcher.parse=true. [FetcherJob.getFields()]
> If the nutch configuration saves all fetched data to DB 
> (fetcher.store.content=true) you'll end up loading GBs of unused content 
> during fetcherJob start-up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[Nutch Wiki] Trivial Update of "AmandaKka" by AmandaKka

2013-03-28 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "AmandaKka" page has been changed by AmandaKka:
http://wiki.apache.org/nutch/AmandaKka

New page:
There is nothing to write about me really.<>
<>
Also visit my webpage; [[http://onlineblackjackforrealmoney1.com|blackjack 
online for real money]]