Re: [DW Discuss] important information

Mark Smith Tue, 06 Jan 2009 11:22:35 -0800

> Browsing your bugzilla, I'm guessing this is premature, but how's that
> journal-import comin' along? :)  I would like to make a feature-request,
> that dw be able to import "archived LJs" which exist as a set of downloaded
> XML data (such as via ljdump or ljarchive).  Because the import bug I read
> specifically only mentioned porting journals that are web-accessible. If I
> missed that in bug # 62 and this is already in the queue, apologies for the
> time-wastage.
>
> ...however, if this were a farmable task... there may be a number of coders
> such my myself who would jump on the task!...  (If I were able to barter a
> weekend's work on it for beta accounts for myself and a few other testers...
> even better?)


The end goal is to have the ability for someone to get a reasonably
complete copy of their data from LJ (or any other LJ-based site) and
import it to Dreamwidth.  The top three things I want to see imported
are Entries, Tags, and Userpics.  If you have all of that, then you'll
get 95% of what people care about (I believe) and we can work on
getting more later, if time permits.

With that in mind, the plan for this project is to get some sort of
tool (there are already a few written, we need to figure out which one
would be best) and encapsulate it in a TheSchwartz worker so that we
can queue up import jobs.  The jobs will then run (using this tool)
and import the data.  The job would be run asynchronously and doesn't
have to complete quickly - it just has to complete at some point.  We
can rate limit the imports on the server to ensure we don't overload
things, too, as importing data in bulk is going to quickly balloon the
database.

If you're interested in the project, I think the first step will be
identifying the best tool to use.  Some requirements:

1) Open source, we need to be able to hack on it.
2) Prefer active/living projects.  (E.g., jbackup is great and easy to
hack on, but it's not been touched in years.)
3) Ideally, the tool should support simple command line usage:

    bin/mytool.py --server livejournal.com --user xb95 --password
foobarbaz --output-file /home/dw/tmp/imports/dw-xb95-import-TIMESTAMP

4) The output format needs to be machine readable (GDBM, XML, etc) and
preserve all entry/comment/etc attributes.

If all of that is true, then we can easily schedule jobs to import
data.  Then, we can write our own loader that takes the import files
from the script and uses them to feed into our database, posting
entries, adding userpics, etc etc.

An alternative to the above is, if some tool already does
export-from-site-plus-import-to-another-site (does ljMigrate? I
think?), then we might be able to just use it as is.  Tell it to
export from the source server and import to DW.  Of course, we will
need to thoroughly vette the process it uses and make sure it's going
to work and not destroy data.  (This amounts to a security audit.)

Okay, this is a bit brain-dumpy.  Feedback encouraged.

> Glad to see this project under development; looking forward to seeing what
> you've come up with.

Thank you, glad to have you!


-- 
Mark Smith / xb95
[email protected]
_______________________________________________
dw-discuss mailing list
[email protected]
http://lists.dwscoalition.org/cgi-bin/mailman/listinfo/dw-discuss

Re: [DW Discuss] important information

Reply via email to