Re: A modest proposal - allow the ID repository to hold xml

2003-09-04 Thread wang liang
I think there should be a work group to discuss the  new possible  format of
ID.PDF or other format can be more  efficient than just text version, Why
should we just stick to the  ASCII text.




How to create ID in Windows system?

2003-09-07 Thread wang liang
I want to create a ID in Windows.Is there any software like nroff and troff
in windows? Will MS-Word or something else be useful? thanks.




how to get a filename for ID?

2003-09-10 Thread wang liang
how to get a filename for ID?I have sent a request to the
[EMAIL PROTECTED]  for a filename for a new ID.But there are no
response till now.It's a individual ID.Creating a filename by myself will be
ok?




How can I get more advice for my I-D?

2003-09-23 Thread wang liang
I submitted my I-D several weeks ago,but I haven't  received even a piece of
advice.This I-D is about the search protocol for library.Now there are more
and more databases in library,we often have to search in many databases one
by one  to get the precise and comprehensive recorder.It's maybe not a
pleasure job.Our protocol just solve this problem.
I think it is a useful  protocol with some  faultiness.How can I get more
advice for it.I can't find a work group for it.Just create one?




How to bulid a new group?

2003-10-02 Thread wang liang


 If I find there is no a group for some important issues about Internet, and
It may be  necessary  to build a new one,what should I do?Who will deal with
this kind of suggestion?Where can I find the detailed  process of  asking
for a new work group?thanks.




Propose some information retrieval protocols for Internet

2003-12-24 Thread wang liang
Propose some information retrieval protocols for Internet.

Now most services of Internet such as E-mail, BBS, and FTP are all based on
public protocol. There is no secret technology in these services. But the
information retrieval service, may be the most important service on
Internet, is still dominated by few search engine companies. This may not
comply with the basic rules of Internet.

Some kernel technology, especially the page ranking algorithms, is the top
secret of search engine companies. There is no any surveillance in this
ranking operation. Some search engines just elevate the ranking score of
some pages for the paying of company customer. Every of us publish the
information on Internet, but search engines administrate what you could know
and what you couldn't reach to a large extent. Someone must pay him money to
place their spam in the first place in the search results of everyone. This
may not be the truth of Internet. Search engines should be a public
mechanism. So there should be some standard protocols to build a real public
opening search system for Internet.

Some work of our research group may be useful for an appropriate information
retrieval protocols for Internet.

Our work belongs to a digital library project. The main purpose of our
project is to build a search system that can integrate all kinds of
information resources on Internet. For this aim, we propose an absolutely
new kind of search system,DRIS (domain resource Integrated system). Its main
point is as follows.

DRIS will build an information retrieval infrastructure for the whole
Internet,but not the finial search engines.

The basic principle of DRIS: in appropriate scope (three levels domain),
applying appropriate information retrieval system, DRIS builds an
appropriate information management frame for the whole Internet.

The basic idea of DRIS: search should be the internal function of Internet
and everyone should have his own search engine.

Related papers
1 Make search become the internal function of Internet.
http://arxiv.org/abs/cs.IR/0311015
2 Evolution:Google vs.GRIS. http://arxiv.org/abs/cs.DL/0312024

We have built a testbed for DRIS. I think build a public information
retrieval system is very important for the evolution of Internet.





Re: Propose some information retrieval protocols for Internet

2003-12-25 Thread wang liang
> I believe you are talking about information *indexing* service,
> not "information *retrieval* service"

DRIS will build the information retrieval infrastructure for Internet, but
not the final search engines. Many intelligent search systems can apply DRIS
as their data source and provide high quality of personal search service.

> DRIS is only a design.
> When you say "build", who will do it? How does it get paid?
> (i.e. What is the incentive for anyone to do this?)

I have mentioned in our paper, DRIS will improve the performance of Internet
search engine in recency, coverage and so on, but this can't ensure the
establishment of DRIS.

The architecture of DRIS is organization level-sub country Internet
level-country level-whole Internet level. DRIS will first solve some urgent
problems in the bottom level, then to the top level. Just in our testbed,
CERNET (China education and research network), few universities have the web
search engine for the school network. Further more, most university has many
characteristic information resources such as Ftp, BBS and special databases
in library, but almost no a university has union search system that can
efficient integrate all these resources. To find the comprehensive, we
always have to search in many search interfaces one by one. It's the problem
in organization level. Then there still no an efficient to share these
resources among different universities. It's the problem in sub Internet
level. These all bring the request for creating the underlying structure of
DRIS. Solving some urgent problems of his own and then benefiting others may
be the real guarantee for the success of DRIS.

Who control the DRIS?It's administrated by none of us but every of us. DRIS
is managed by its users and coordinated by a public organization, just like
management method of DNS. Every organization is its customer and also its
builder and manager. It's just the real truth of Internet. DRIS is a public
opening system, which needn't any profits from its users and of course need
not any advertisements and Spam of company.

> I had a quick look of your first paper.
> - It seems to suggest that each DNS domain has a central authority,
>   which may not be the case
> - It is unclear to me DNS domains are the right unit for indexing
>   webs, as opposed to topical areas.


Current search engines all managed by corresponding company. This is the
centralized management method. This method is not suitable to manage the
information on Internet. Now there are billions of web pages, millions of
databases and many other kinds of information resources on Internet. Search
engines will encounter many bottleneck problems when the size of its
database reaches some critical values. In fact, just as a web pages search
engine, it can't continue to index close to the entire Web as it grows. Now
the update interval of most pages database is almost one month. We can also
obtain information from different special databases like IEEE's digital
library, FTP, P2P, etc. Could you image single private company can
efficiently administer all these information resources?

Every search engines try to provide the comprehensive and fresh information
for its users, but none of them would build a database system that can
mirror the whole Internet.

So a distributed management frame may be more appropriate for Internet. As
our experience, decentralized management is much more effective than
absolutely centralized administration in a large-scale system. By this
means, the key issue is how to divide the Internet correctly. We found there
has been an available division method on Internet, domain name system (DNS).
DNS is a hierarchical distributed system. All the web site on Internet is
efficiently managed in this system. The basic architecture of DNS is also an
organization level-sub country internet level-country level. We just apply
its basic idea to DRIS, but not strictly comply it.


>
> bottomline: yes the current google domination is not sustainable, however
> the basis of DRIS design raises its own problems.

Although I can't say DRIS is better than Google at now, but it can surely
meet some demands that Google can't fill. In fact,DRIS will build a system
than can integrate all kinds of resources, but not only web pages. The first
testbed of DRIS on CERNET will be finished in 2004 fall. Practice is the
only principle to judge a theory. More discussion is also very important for
a new system.






Re: Propose some information retrieval protocols for Internet

2003-12-27 Thread wang liang
Paul Robinson,

>We can tweak the spec and the technology to put ownership back
> where it belongs, but it needs some thought.

Related papers in word format were sent to you.

In fact, our system can be regarded as an information shared platform. To
integrated different kinds of information resources, DRIS provides two
methods.

1 distributed search interface. Corresponding protocol defines a
platform-independent search interface and a collection description standard
for heterogeneous information resources. An I-D "information retrieval
protocol for digital resources" has been proposed.

2 metadata harvesting. It will define a standard metadata format that can be
compatible with most database system. A protocol based some available
opening standard like OAI will be proposed.

As long as data source can provide the standard distributed search interface
or comply with the metadata harvest format, they can all be brought into
DRIS.

But the web page is a special data source for its distributed character and
large amount. To efficiently integrate web pages on Internet, DRIS will
build a public opening web pages database, which will strictly comply with
the principle of (organization level-conventional database system)-(sub
country Internet level-metadata harvest system)-(country level-distributed
system). All the work is carried out automatically. Like in the bottom
level, we always don't wish to publish all our pages out of school network.
Although all the pages is automatically downloaded and indexed by our DRIS
server, we can limit the metadata of web pages sent to higher layer to
ensure some special considerations.

Other kinds of resources can select appropriate method and be imported in
appropriate layer.





Re: Propose some information retrieval protocols for Internet

2003-12-31 Thread wang liang

Happy new year!

You could find more information about our project in
http://202.114.9.200/English/main.htm
More content will be translated in these days.

> Sorry it took so long to get back to you. I've been busy and haven't
> been able to work with these yet, but will aim to do so later on this
> week. I look forward to e-mailing you then.
>





Re: Propose some information retrieval protocols for Internet

2004-01-01 Thread wang liang
For some reason, the IP in CERNET isn't accessible outside China.
I will solve this problem as soon as possible.


> but I'm more interested in something else: I'd like to see more semantic
> information in search engines. Eventually, this would allow queries
> like "which actors played in movies after books written by authors born
> in Chili in 1961?"


DRIS will give a promising solution for this demand. Only as a web pages
search system, DRIS will download and index all the web pages in three
levels domain and provide standard search interface in every node. The data
format for input and output of this interface is standard XML but not HTML.
DRIS is not a final search engine, but provide the data source for other
personal intelligent system, which is designed by your interest and
knowledge background. XML may be the best data format for most application
system. You can consider DRIS is a mechanism that arranges all HTML source
to XML source.

Certainly, the XML data in DIRS can comply with the W3C's seven
levels'semantic module, which will give great help to design the intelligent
search systems. But now, there still no a system that can understand you
query "which actors played in movies after books written by authors born in
Chili in 1961?" very well and give you a precise answer.

So DRIS just builds the information retrieval infrastructure for Internet,
other work will be left to AI experts.





Re: Propose some information retrieval protocols for Internet

2004-01-01 Thread wang liang
> Your original post presents several arguments against the current crop of
> search engines and proposes a new distributed search engine system, but
this
> is an _indexing_ function, not a _retrieval_ function.

DRIS's a real search engine.

  Although we say DRIS just builds the information retrieval infrastructure
for Internet, it provides two kinds of search interface, user interface like
Google’s service and API for other intelligent search system. Do you fell
very satisfied with the search results of current search engines? It's just
the crude results. So we say DRIS is information retrieval infrastructure.

> The architecture of DRIS is organization level-sub country Internet
> level-country level-whole Internet level.
> Can you demonstrate why country-level databases are a technical advantage,
> other than providing certain governments the ability to restrict the
> information their citizens can access?

In fact, if governments want to block out a web site, there are many other
efficient methods. They needn't restrict it in information retrieval stage.
On the other hand,applying DRIS,every one will have his own search engine ,
so some restriction could be set in children's search engine.





Re: Propose some information retrieval protocols for Internet

2004-01-01 Thread wang liang
In fact, our project is just a digital library project. DRIS is its main
part. Another part is [EMAIL PROTECTED], a intellitent search system, which
will directly answer all your question in the future. But now it can only
answer some questions about routine of libray. Such system is also called
virtual consulting system, which is a traditional research subject in
digital library. There has been some excellent systems in some libraries.

But the main disadvantage of these systems lie that its knowledge database
is too small and can only answer few questions. Internet may be the biggest
knowledge database, so we proposed the DRIS, which will act as the knowledge
source of such personal intelligent systems.

At all, we need an answer, but not piles of jumbled hyperlinks.




>Any kind of metadata based solution can resolve this context ambiguity.
>Digital library folks have been doing this for years.

> or how about

>http://www.irtf.org/siren/draft-klensin-dns-search-05.txt ?






Re: Propose some information retrieval protocols for Internet

2004-01-02 Thread wang liang

Please,please learn some basic knowledge about search engine.
Now we are discussing the search engine, one of most important services of
Internet.


Asking  for advices is first step, then a work group may be proposed.  So
don't  worry about where we discuss this issue.

>without looking into DRIS...

>Knowledge base for the whole Internet? The Internet based information
>is so dynamic (see the use SRV and dynamic DNS.) IMO, the hierarchy and
>zone delegation nature of DNS is a scalable one. It is hard to populate
>the knowledge base for the whole Internet. I know Dave clark et all has
>recently proposed a overlay knowledge plane architecture which use
>sensors and actuators to feed knowledge (but, it is for network
>management purpose.)

>In any case, you should take this issue to the right forum. SIREN RG?








Re: Propose some information retrieval protocols for Internet

2004-01-03 Thread wang liang

The link is moved to http://www.lib.hust.edu.cn/dl-lib/English/main.htm
More content will be translated.
 
>you should provide us with a link(http://202.114.9.200/English/main.htm)
>that works. 





Building a new work group for public information retrieval protocol, ask for advices.

2004-01-08 Thread wang liang
After publishing the message "propose some information retrieval protocols
for Internet", we received many advices. Now we
want to build a new work group for this issue, asking for more advices.
Information retrieval service may exceed E-mail
service and become the most import service of Internet, so we can't neglect
it.

The reason to build a work group for public information retrieval protocols
lies in the disadvantage of current commercial
search engines and the improvement in future public search system.

The faults in commercial search engines.

1 In technology. Now no search engine can cover 60% of all the pages on
Internet. The average update interval of their web
pages database is almost one month. This is mainly because no of them can
close keep up with the explosive web pages on
Internet. But the web page is only one kind of information resources. There
are still many other resources such as video,
special databases, BBS, etc. Could you image single search engine company
can efficiently administer all these information
resources?

2 In business model. Now many search engine companies are concerned with how
to make profit from company users by
advertisement and ranking prominence, but never consider what its real
customers will feel. Search engine originally is tools
for the convenience of Internet customers, but search engine companies have
to apply advertisement or selling ranking
prominence, somewhat inconvenient to information retrieval, to maintain
their subsistence. In other words, search engines
make money at the cost of inconvenience of most Internet users, but not its
high quality of search service.

3 Except search engine, all the services of Internet such as E-mail, BBS,
and FTP are all based on public protocol. There is
no secret technology in these services. But the information retrieval
service, may be the most important service on Internet,
is still dominated by few search engine companies. Many experts know the
basic "Pages Ranking" algorithm, but no one know its
detail, which is commercial secret. No public surveillance, no real candid
ranking algorithm. but We all know another world
famous algorithm very well, "money can elevate ranking score". This may not
comply with the basic rules of Internet, a public
and free world.

4 In any free market, customers should be the God forever, but not few
companies.


The improvement in new public search system, DRIS (Domain resources
integrated system)

1 In technology. DRIS will build the information retrieval infrastructure of
Internet. DRIS applies a hierarchical
distributed architecture to manage all the information on Internet, just
like DNS. Its main principle is (organization level
-conventional database system)-(main sub country Internet level-metadata
harvest system)-(country level-distributed search
system).In easy words, like web pages system, every DRIS server in bottom
level like a university will download and index all
the web pages in its local network and then send the metadata to higher
layer. All the other resources are also integrated in
this method. So DRIS will improve the performance of Internet search engine
in recency, coverage and so on.

2 Management. Who will control the DRIS? It's administrated by none of us
but every of us. DRIS is managed by its users and
coordinated by a public organization, just like management method of DNS.
Every organization is its customer and also its
builder. It's just the real truth of Internet. DRIS is an opening system,
which needn't any profits from its users and of
course need not any advertisements.

3 The basic idea of DRIS : "search should be the internal function of
Internet and every one should have his own search
engine". DRIS just provide the rude search results (like the results in
current search engine). Many intelligent search
systems can apply DRIS as their data source and provide high quality of
personal or commercial search service. So commercial
search engine can still survive in the way it should be.

4 Although DRIS gives us an excellent and promising solution for the new
public Internet search system, this can't ensure the
establishment of DRIS. One important principle in technology, the best
technology is the technology that can meet the urgent
demand in society. This is just the secret of DRIS. In our testbed, in
organization level, only few universities have the web
search engine for the school network. Say nothing of union search system
that can efficiently integrate all its information
resources such as ftp, BBS and special databases in library. It's the demand
in third layer. Sharing the information
resources between different organizations is also an attraction, which is
the demand to build the second layer's DRIS. In the
top layer, integrating all the information resources on Internet may be the
dream of everyone.

5 Practice is the only principle to judge a theory. Now we have built some
experimental third layer's DRIS servers in HuBei
Province. I can only 

Re: Building a new work group for public information retrieval protocol, ask for advices.

2004-01-08 Thread wang liang
The main aim of DRIS is to build the information retrieval infrastructure
for Internet. XML/RDF/Webservice is just tools.
In fact, in IRTF,http://www.irtf.org,an organization for the "evolution of
the future Internet"(supported by IETF)
There has been a work group "Information Infrastructure Architecture".

This proposed group can be regarded as its continued work.


> I think it's not clear that this is an appropriate IETF working group that
> is being proposed:
>





Re: Building a new work group for public information retrieval protocol, ask for advices.

2004-01-09 Thread wang liang

DRIS will build a frame to integrate all kinds of information on
Internet.It's a information retrieval infrastructure.
But there still no a work group for information retrieval service in
IETF,but it may be the most important service on Internet.

There are some pre-discussions in http://www.irtf.org ,just for information
retrieval infrastructure.

So building a work group may make all related work (including DASL) more
efficient.

What's your opinions?




> There used to exist a DASL WG and its main proposal included a framework
> for sending Search requests to Web servers (including WebDAV servers).
> The framework was complemented by one proposed syntax for searching
> the resources stored directly on that server according to their
> metadata values.  Another syntax for the same framework could
> easily allow some Web servers to act as search aggregators or
> proxies for other repositories -- including not just a larger
> group of Web servers but also non-HTTP URLs could be returned.
>
> Now the DASL WG doesn't exist, but the work still continues on
> the WebDAV WG mailing list.





New BOF in Application Area (Internet information retrieval infrastructure)

2004-02-17 Thread wang liang
There has been some discussion for Internet information retrieval service in
work groups of IRTF. Now this issue will be discussed in the BOF of
Application Area.

As one of most important services of Internet, current information retrieval
service is still far from our expectation---precise, comprehensive and fresh
information. This problem may become more serious with the rapid development
of Internet. We need pay more attention to this issue. This BOF is just for
it. Wish you can participate in.

The date of this BOF will be determined soon.

The content of BOF:

Internet Information Retrieval Infrastructure(iiri)
=
CHAIRS: Guo Yiping ([EMAIL PROTECTED])
Wang Liang ([EMAIL PROTECTED])

Co-Chair: Andrew Newton([EMAIL PROTECTED])

What's the main purpose of Internet? Information retrieval and exchanging.
But what's the most important principle to judge a network? Maybe
communication speed. For common user, they can't feel search service in GB
Internet is better than that in MB networks. The great progress of Internet
didn't bring the great improvement in its main service.
Now Internet is going far and far from their original aim, "knowledge source
of human being", and transforming into a jumbled information sea. Many
experts are mainly concerned with "physical Internet", but now we need pay
more attention to "information Internet".
Information retrieval services may be the most important service of
Internet, but there is till no a special work group for it. Current
commercial search engines meet many bottleneck problems in coverage and
recency. Its service is far from our expectation. It's just a web pages
search system. We can also get information from many other information
resources such as special databases, FTP search engine, P2P, etc. So a work
group for Internet information retrieval system is very necessary. We have
proposed a basic information retrieval frame, DRIS (Domain Resource
Integration System), for this issue. Any related topic could be discussed in
this group.

AGENDA:
Draft agenda for the BOF:
--
History of IETF work in this area
15 min
Introduction to problem space and DRIS15 min
Internet Information retrieval infrastructure and digital library15 min
draft-liang-irpdl-03.txt
15 min
IPv6 and information retrieval system
15 min
Discussion
remaining time

Description of this work group(DRIS):
With the rapid increase of the web pages, the coverage of search engines
will become poorer and the update interval will be much longer. If the
current architecture of search engines is still in use, it will be an
impossible mission to find the precise and comprehensive information in the
future. This problem will be more serious when IPV6 technology is widely
implemented in communication networks. The problem of "Too much information
means no information" may become a disaster with information explosion. To
solve this problem, there should be an efficient information management
system for Internet.
In this group, Domain Resource Integrated System--DRIS will be proposed.
DRIS is a distributed information retrieval system, which will build the
information retrieval infrastructure for the Internet and also can be
regarded as a kind of Internet information management system.
DRIS is a hierarchical distributed search system and comprise three kinds of
information retrieval system, conventional database system, distributed
search system and metadata harvest system. We will first define the basic
search system and then define the entire DRIS.
Specific work items are:
1 Standard distributed search system. It defines the platform-independent
search interface and a collection description standard for heterogeneous
information resources. An I-D "information retrieval protocol for digital
resources" has been proposed.
2 Standard metadata harvest system. A protocol based some available opening
standard like OAI will be proposed. It will define a standard metadata that
can be compatible with most database system.
3 Standard public web pages search system.
4 DRIS. It will define entire DRIS. It includes its whole architecture, the
relation between different nodes, etc.
5 DRIS and IPV6. The cooperation with IPV6 WG will be proposed. IPV6 will be
the most distinct feather of next generation Internet.IPV6 is still in
improving and any technology that can benefit the Internet all can be added
to the IPV6 system. Since the searching is the main service of most user of
Internet and this service is not so satisfied to us in current Internet, why
not take this request into account when build the new Internet. For example,
in IPV6, all kinds of data flows are assigned a priority, and then Internet
can guarantee a high priority to the data flow of DRIS. So there may need
some considerations for the relation 

Re: dris mailing list

2004-03-01 Thread wang liang

Because this BOF is determined in Feb 16,this mail list and relatted archive
is not prepared very well.
These problems will be solved as soon as possible.

Thanks.
Wang Liang


- Original Message -
From: "Melinda Shore" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Monday, March 01, 2004 1:01 PM
Subject: dris mailing list


> I realize that the DRIS BOF was put together somewhat quickly, but
> the mailing list situation is a little frustrating.
>
> 1) there's no mailing list archive at the URL given on the agenda
> 2) there are no subscription directions on the agenda
> 3) mail sent to [EMAIL PROTECTED] bounces
> 4) mail sent to [EMAIL PROTECTED] bounces
> 5) mail sent to [EMAIL PROTECTED] bounces
>
> Because of the IETF's tradition of both working and making decisions
> on mailing lists, it's really important that there be a functioning
> mailing list for each of the various efforts.
> It would be helpful in the future if stuff like this can be verified
> before it's announced.
>
> Thanks,
>
> Melinda
>
>
>





Re: dris mailing list

2004-03-02 Thread wang liang
mailing list: [EMAIL PROTECTED]
To subscribe: To join the DRIS discussion list, send a request to:
[EMAIL PROTECTED] and enter the word subscribe in the Subject line of the
message and in the message body.
Relatted archive will be organized in
http://202.114.9.3/dl-lib/English/main.htm.

Wang Liang

- Original Message -
From: "wang liang" <[EMAIL PROTECTED]>
To: "Melinda Shore" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Monday, March 01, 2004 7:57 PM
Subject: Re: dris mailing list


>
> Because this BOF is determined in Feb 16,this mail list and relatted
archive
> is not prepared very well.
> These problems will be solved as soon as possible.
>
> Thanks.
> Wang Liang
>







Welcome to DRIS discussion list!

2004-03-16 Thread wang liang
Welcome to DRIS discussion list!

After holding the BOF in 59th meeting, we will start the WG application
process. As the advice of APPS, we still should found more people to join
its discussion  list. DRIS is a large system, without enough people, the new
WG may not meet its milestone. If you are interested in this topic,
sincerely wish your could join its discussion.

 As the agenda of IIRI (http://www.ietf.org/ietf/04mar/iiri.txt), This new
WG will cover the Internet search engine, Digital Library, Information GRID,
etc. Any related topic could be discussed in this list. We all have a common
goal: a usable Internet information infrastructure.

Although it's the first time that IETF organized the formal discuss for
Internet information retrieval problem, there has been many sporadic
discussion for it in other place like IRTF.  Now search engine has become
the hottest topic since 2003. Some disadvantages in current search engine
attracted many experts to find the better solution. IETF may be the best
place to discuss the new Internet information retrieval system.

Mailing list: [EMAIL PROTECTED]
subscribe: [EMAIL PROTECTED], to subscribe,send a message with title
"subscribe"
new site: http://dris.hust.edu.cn

Regards
Wang Liang





A modest proposal - allow the ID repository to hold xml

2003-09-02 Thread wang liang
Rosen, Brian wrote:>Allow the submission of an xml file meeting the requirements of RFC2629>along with the text file (and optional ps file) for an Internet Draft.
 
  I totally agree with it.RFC2629 may be something need improvement.Now some protocols are completely written in XML schema,like UDDI. Even now I have to find how to use the nroff.I can't find its window version.It just give us some inconvenience.
  you will find search in XML is easier than in any other formats,if you have ever made some programs about it.XML may be the most efficient method to describe a protocol,how can we refuse it.
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software