[Nutch-dev] [jira] Commented: (NUTCH-497) Extreme Nested Tags causes StackOverflowException in DomContentUtils...Spider Trap

2007-06-21 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12506775 ] Andrzej Bialecki commented on NUTCH-497: - The patch looks good to me as it is now - however, I've seen

[Nutch-dev] Found the bug in Generator when number of URLs is small

2007-06-21 Thread Vishal Shah
Hi, I think I found the reason why the generator returns with an empty fetchlist for small fetchsizes. After the first job finishes running, the generator checks the following condition to see if it got an empty list: if (readers == null || readers.length == 0 ||

Re: [Nutch-dev] Found the bug in Generator when number of URLs is small

2007-06-21 Thread Doğacan Güney
On 6/21/07, Vishal Shah [EMAIL PROTECTED] wrote: Hi, I think I found the reason why the generator returns with an empty fetchlist for small fetchsizes. After the first job finishes running, the generator checks the following condition to see if it got an empty list: if (readers

[Nutch-dev] Hudson build is back to normal: Nutch-Nightly #124

2007-06-21 Thread hudson
See http://lucene.zones.apache.org:8080/hudson/job/Nutch-Nightly/124/changes - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits.

[Nutch-dev] [jira] Created: (NUTCH-503) Generator exits incorrectly for small fetchlists

2007-06-21 Thread Vishal Shah (JIRA)
Generator exits incorrectly for small fetchlists - Key: NUTCH-503 URL: https://issues.apache.org/jira/browse/NUTCH-503 Project: Nutch Issue Type: Bug Components: generator

[Nutch-dev] [jira] Updated: (NUTCH-503) Generator exits incorrectly for small fetchlists

2007-06-21 Thread Vishal Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vishal Shah updated NUTCH-503: -- Attachment: emptyfetchlist.patch I've created a patch to fix this issue. Please review, and commit it

[Nutch-dev] [jira] Updated: (NUTCH-503) Generator exits incorrectly for small fetchlists

2007-06-21 Thread Vishal Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vishal Shah updated NUTCH-503: -- Attachment: emptyfetchlist.patch Hi, The previous patch is missing a header line. I've reattached

[Nutch-dev] http.content.limit not respected when the Content-Type header has charset attributes

2007-06-21 Thread Vishal Shah
Hi, Many of the urls we crawl have headers that look like this: Connection: close Date: Thu, 21 Jun 2007 09:28:42 GMT Accept-Ranges: bytes ETag: 2c0c3-650-cc1eb800 Server: Apache/2.0.40 (Red Hat Linux) Content-Length: 1616 Content-Type: text/html; charset=ISO-8859-1 Last-Modified: Mon, 09

Re: [Nutch-dev] Found the bug in Generator when number of URLs is small

2007-06-21 Thread Vishal Shah
Hi Dogacan, I've uploaded the patch to Nutch-503. http://issues.apache.org/jira/browse/NUTCH-503 Regards, -vishal. -Original Message- From: Dogacan Güney [mailto:[EMAIL PROTECTED] Sent: Thursday, June 21, 2007 12:33 PM To: [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: Found the

[Nutch-dev] [jira] Commented: (NUTCH-471) Fix synchronization in NutchBean creation

2007-06-21 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12506883 ] Doğacan Güney commented on NUTCH-471: - We have been using this on our machines for some time, so if there are no

[Nutch-dev] [jira] Commented: (NUTCH-497) Extreme Nested Tags causes StackOverflowException in DomContentUtils...Spider Trap

2007-06-21 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12506894 ] Dennis Kubes commented on NUTCH-497: I agree, I think it would be better to have something generic if we are

[Nutch-dev] 送票上门!

2007-06-21 Thread dj
欢迎合作*送票上门 公司本着互惠互利的原则合理对外代开发票,代开范围(商品销售、广告、 “电脑版”运输发票、其它服务、租赁、建筑安装、餐饮定额发票等)贵公司 在做帐或进销项方面需要到的话,我司可提供全方面的服务。 可根据所开数量额度的大小来衡量优惠的点数,并建议长期的合作。郑重 承诺!所用票据均可上网查询或验证后付款! 联系人 :刘晓华 电 话 :13714089426

[Nutch-dev] [jira] Commented: (NUTCH-503) Generator exits incorrectly for small fetchlists

2007-06-21 Thread Emmanuel Joke (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12506922 ] Emmanuel Joke commented on NUTCH-503: - I just try your patch and i'm afraid I still have the same issue.

[Nutch-dev] [jira] Resolved: (NUTCH-471) Fix synchronization in NutchBean creation

2007-06-21 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doğacan Güney resolved NUTCH-471. - Resolution: Fixed Assignee: Doğacan Güney Committed in rev. 549507 with minor style

[Nutch-dev] (no subject)

2007-06-21 Thread 四海恒泰
一电脑一账号即可收发传真 电子传真使用手册 发E-mail?难以知道对方是否马上收到。发传真,缺了传真机和纸墨就一筹莫展。现在,有一种新的解决方法:收发电子传真。 所谓的电子传真,是一种连接传真机和互联网的网络传真方式。借助它,你只需要一台可以上网的电脑,即可接收和发送传真。 电子传真的好处就无需赘述了,省力省钱还环保。怎么样,来学习一下吧! �皓鹱⒉嵴撕� 通过搜索引擎可以找到一些电子传真网站,比如很多人使用的www.hi-fax.com等,先上去申请一个账号(包括一个传真信箱,对应一个传真号码)。

[Nutch-dev] 顺达实业公司

2007-06-21 Thread 张生
您好: 我顺达实业公司有发票可向外代开, 国税、地税、如有这方面需要请 来电联系, 联系人:张俊生 联系电话:13560376098 [EMAIL PROTECTED]