Re: [Dspace-tech] Sequence ID generation
Thanks to everyone for their input on this. Well I had no idea that this would open such a can of worms! Mea culpa... I guess my question really is more the philosophical aspect of *how* the sequence ID is (mis)used. I have items being submitted to DSpace in response to a trigger from another system, but I need to pass back an identifier to that system which matches the *bitstream* per se (hehe), not just the item metadata page. In this particular project, it is the *bitstreams* which are considered persistent objects, not just the items. Problems arise in this case in trying to programmatically extract pointers to bitstreams for external systems, particularly in light of the fact that bitstream names need not be unique (is that right?). But I'm wandering into a [dspace-general] discussion here...I note Richard's previous instigation of discussion on these issues at http://mailman.mit.edu/pipermail/dspace-general/2003-September/15.ht ml and it would appear that general consensus is that persistent bitstream IDs are *not* a good idea. However, when faced with a project that *requires* them, what is one to do? Regards Gary Gary Browne Development Programmer Library IT Services University of Sydney Australia ph: 61-2-9351 5946 -Original Message- From: Larry Stone [mailto:[EMAIL PROTECTED] Sent: Wednesday, 9 May 2007 6:18 AM To: Richard Rodgers Cc: Gary Browne; dspace-tech@lists.sourceforge.net Subject: Re: [Dspace-tech] Sequence ID generation > First, it is assigned sequentially and IDs are not reused if a bitstream > is deleted. There is no magic ordering, and it was *not* intended for > organizing a set of bitstreams into a meaningful sequence (e.g. PDF > chapters of a book). Its sole purpose is to provide a *durable* unique > ID for a bitstream - think of it as a 'sub-handle' ID - modulo an item There's actually a bug in the data model, then. It's possible to get the same sequence ID reused, because when adding a Bitstream, the code only looks for the highest existing SequenceID and increments that. 1. Take an existing Item, go into the "Edit Item" admin page (/dspace/tools/edit-item), and add a new Bitstream with a distinctive name. Say, "foo.pdf". 2. Determine its Sequence ID. Go to the Item page /dspace/handle/ and observe the "View/Open" link next to your bitstream, the path element after its handle is the SequenceID. It should be the highest SequenecID there since it was most recently added. There are some "invisible" Bitstreams (like licenses) that also take up SIDs. 3. Go back to the "Edit" page and delete that newest bitstream. 4. Add a different bitstream with a different name, say, "bar.pdf". 5. Go to a freshly-loaded copy of the Item page, and observe that "bar.pdf" has the same SequenceID that "foo.pdf" had before. I'll submit this as a bug on sourceforge too. -- Larry - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Sequence ID generation
> First, it is assigned sequentially and IDs are not reused if a bitstream > is deleted. There is no magic ordering, and it was *not* intended for > organizing a set of bitstreams into a meaningful sequence (e.g. PDF > chapters of a book). Its sole purpose is to provide a *durable* unique > ID for a bitstream - think of it as a 'sub-handle' ID - modulo an item There's actually a bug in the data model, then. It's possible to get the same sequence ID reused, because when adding a Bitstream, the code only looks for the highest existing SequenceID and increments that. 1. Take an existing Item, go into the "Edit Item" admin page (/dspace/tools/edit-item), and add a new Bitstream with a distinctive name. Say, "foo.pdf". 2. Determine its Sequence ID. Go to the Item page /dspace/handle/ and observe the "View/Open" link next to your bitstream, the path element after its handle is the SequenceID. It should be the highest SequenecID there since it was most recently added. There are some "invisible" Bitstreams (like licenses) that also take up SIDs. 3. Go back to the "Edit" page and delete that newest bitstream. 4. Add a different bitstream with a different name, say, "bar.pdf". 5. Go to a freshly-loaded copy of the Item page, and observe that "bar.pdf" has the same SequenceID that "foo.pdf" had before. I'll submit this as a bug on sourceforge too. -- Larry - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Sequence ID generation
Hi Gary: Here's a little more explanation/history of the sequence ID. First, it is assigned sequentially and IDs are not reused if a bitstream is deleted. There is no magic ordering, and it was *not* intended for organizing a set of bitstreams into a meaningful sequence (e.g. PDF chapters of a book). Its sole purpose is to provide a *durable* unique ID for a bitstream - think of it as a 'sub-handle' ID - modulo an item (sorry for the Latin again, it just kind of crept in). DSpace originally used the bitstream database key, but while unique, this wasn't durable in the sense that if you moved to a different database, or compressed, etc, the key might change. Whereas the sequence ID is a bona fide (damn!) piece of metadata, albeit a fairly opaque one. Remember that the filename need not be unique (you can have 2 bitstreams with the same name), so we do need something in this role. We actually kicked around various proposals (e.g. MD5 checksums, date stamps, etc), but the sequence ID won primarily because it resulted in shorter, easier-to-type URLS. Hope this sheds some light, Richard R On Tue, 2007-05-08 at 10:01 +1000, Gary Browne wrote: > Thanks Claudia > > Though I'm only inferring what a "numerus corens" is - I'm not really up with > my Latin. > > Cheers > Gary > > > Gary Browne > Development Programmer > Library IT Services > University of Sydney > Australia > ph: 61-2-9351 5946 > > -Original Message- > From: Claudia Jürgen [mailto:[EMAIL PROTECTED] > Sent: Thursday, 3 May 2007 5:12 PM > To: Gary Browne > Cc: dspace-tech@lists.sourceforge.net > Subject: Re: [Dspace-tech] Sequence ID generation > > Hi Gary, > > the sequence number is generated in: > org.dspace.conten.Item > update() > > // Set sequence IDs for bitstreams in item > int sequence = 0; > Bundle[] bunds = getBundles(); > > // find the highest current sequence number > for (int i = 0; i < bunds.length; i++) > { > Bitstream[] streams = bunds[i].getBitstreams(); > > for (int k = 0; k < streams.length; k++) > { > if (streams[k].getSequenceID() > sequence) > { > sequence = streams[k].getSequenceID(); > } > } > } > > // start sequencing bitstreams without sequence IDs > sequence++; > > for (int i = 0; i < bunds.length; i++) > { > Bitstream[] streams = bunds[i].getBitstreams(); > > for (int k = 0; k < streams.length; k++) > { > if (streams[k].getSequenceID() < 0) > { > streams[k].setSequenceID(sequence); > sequence++; > streams[k].update(); > } > } > } > > it's just a numerus corens. > > sunny greetings > > Claudia Jürgen > University Dortmund > > > Gary Browne schrieb: > > Hi everyone - I submitted this question previously but had no > > replies...thought I'd try my luck again with a cunningly disguised > > turned about subject line. > > > > > > > > > > > > Regarding the sequence ID, the number between the handle and the > > filename in a DSpace bitstream URL: > > > > > > > > dspace url/bitstream/handle/sequence ID/filename > > > > > > > > can anyone tell me how the sequence ID number is generated by DSpace? > > Does it simply correspond to the sequence of bitstreams as outlined in > > the contents file? > > > > > > > > Thanks > > > > Gary > > > > > > > > Gary Browne > > Development Programmer > > Library IT Services > > University of Sydney > > Australia > > ph: 61-2-9351 5946 > > > > > > > > > > > > > > > > > > - > > This SF.net email is sponsored by DB2 Express > > Download DB2 Express C - the FREE version of DB2 express and take > > control of your XML. No limits. Just data. Click to get it now. > > http://sourceforge.net/powerbar/db2/ > > > > > > > > > > ___ > > DSpace-tech mailing list > > DSpace-tech@lists.sourcefor
Re: [Dspace-tech] Sequence ID generation
Thanks Claudia Though I'm only inferring what a "numerus corens" is - I'm not really up with my Latin. Cheers Gary Gary Browne Development Programmer Library IT Services University of Sydney Australia ph: 61-2-9351 5946 -Original Message- From: Claudia Jürgen [mailto:[EMAIL PROTECTED] Sent: Thursday, 3 May 2007 5:12 PM To: Gary Browne Cc: dspace-tech@lists.sourceforge.net Subject: Re: [Dspace-tech] Sequence ID generation Hi Gary, the sequence number is generated in: org.dspace.conten.Item update() // Set sequence IDs for bitstreams in item int sequence = 0; Bundle[] bunds = getBundles(); // find the highest current sequence number for (int i = 0; i < bunds.length; i++) { Bitstream[] streams = bunds[i].getBitstreams(); for (int k = 0; k < streams.length; k++) { if (streams[k].getSequenceID() > sequence) { sequence = streams[k].getSequenceID(); } } } // start sequencing bitstreams without sequence IDs sequence++; for (int i = 0; i < bunds.length; i++) { Bitstream[] streams = bunds[i].getBitstreams(); for (int k = 0; k < streams.length; k++) { if (streams[k].getSequenceID() < 0) { streams[k].setSequenceID(sequence); sequence++; streams[k].update(); } } } it's just a numerus corens. sunny greetings Claudia Jürgen University Dortmund Gary Browne schrieb: > Hi everyone - I submitted this question previously but had no > replies...thought I'd try my luck again with a cunningly disguised > turned about subject line. > > > > > > Regarding the sequence ID, the number between the handle and the > filename in a DSpace bitstream URL: > > > > dspace url/bitstream/handle/sequence ID/filename > > > > can anyone tell me how the sequence ID number is generated by DSpace? > Does it simply correspond to the sequence of bitstreams as outlined in > the contents file? > > > > Thanks > > Gary > > > > Gary Browne > Development Programmer > Library IT Services > University of Sydney > Australia > ph: 61-2-9351 5946 > > > > > > > > > - > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > > > > > ___ > DSpace-tech mailing list > DSpace-tech@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/dspace-tech - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech
Re: [Dspace-tech] Sequence ID generation
Hi Gary, the sequence number is generated in: org.dspace.conten.Item update() // Set sequence IDs for bitstreams in item int sequence = 0; Bundle[] bunds = getBundles(); // find the highest current sequence number for (int i = 0; i < bunds.length; i++) { Bitstream[] streams = bunds[i].getBitstreams(); for (int k = 0; k < streams.length; k++) { if (streams[k].getSequenceID() > sequence) { sequence = streams[k].getSequenceID(); } } } // start sequencing bitstreams without sequence IDs sequence++; for (int i = 0; i < bunds.length; i++) { Bitstream[] streams = bunds[i].getBitstreams(); for (int k = 0; k < streams.length; k++) { if (streams[k].getSequenceID() < 0) { streams[k].setSequenceID(sequence); sequence++; streams[k].update(); } } } it's just a numerus corens. sunny greetings Claudia Jürgen University Dortmund Gary Browne schrieb: > Hi everyone - I submitted this question previously but had no > replies...thought I'd try my luck again with a cunningly disguised > turned about subject line. > > > > > > Regarding the sequence ID, the number between the handle and the > filename in a DSpace bitstream URL: > > > > dspace url/bitstream/handle/sequence ID/filename > > > > can anyone tell me how the sequence ID number is generated by DSpace? > Does it simply correspond to the sequence of bitstreams as outlined in > the contents file? > > > > Thanks > > Gary > > > > Gary Browne > Development Programmer > Library IT Services > University of Sydney > Australia > ph: 61-2-9351 5946 > > > > > > > > > - > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > > > > > ___ > DSpace-tech mailing list > DSpace-tech@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/dspace-tech - This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ ___ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech