Hi all,

About 10 days ago I wrote a message saying that I had some
"Authorization Errors" going on, but I didn't have any ideia of what
could be causing the problem.
At that time I didn't give information enough for someone to help me,
thanks Arno Lehmann for trying :)

The scenario is that: I have a Bacula-director running at server
"maculele" and Bacula-fd runing at almost 10 different servers. The
problem only happens on server "newton". The File Daemon is refusing
the Bacula-Director connection, logging the following error:

-------------------------------------------------------------------------
05-Nov 21:00 maculele-dir: Start Backup JobId 603,
Job=Newton.2005-11-05_21.00.00
05-Nov 21:00 maculele-dir: Newton.2005-11-05_21.00.00 Fatal error: Unable
to authenticate with File daemon. Possible causes:
Passwords or names not the same or
Maximum Concurrent Jobs exceeded on the FD or
FD networking messed up (restart daemon).
Please see http://www.bacula.org/html-manual/faq.html#AuthorizationErrors
for help.
------------------------------------------------------------------------

Well, this time I tried to collect some more information.

1) I compared the versions of almost all packages running on my servers
and I didn't find anything different.

2) I installed the director on the server the problem was occurring, and
then I realized that the problem was related with any procedure of
connecting and authenticating with bacula services. I couldn't even run
the console, cause it couldn't connect to the director.

3) I tried to connect through telnet but it didn't work. In fact, all
bacula ports are open but as soon as the connection is established it is
reset by the remote host:

-----------------------------------------------------------------
[EMAIL PROTECTED]:~$ telnet newton 9101
Trying 192.168.0.8...
Connected to newton.dcc.ufba.br.
Escape character is '^]'.
Connection closed by foreign host.
[EMAIL PROTECTED]:~$
------------------------------------------------------------------

But it is different from trying to connect on some closed port:

-------------------------------------------------------------------------------
[EMAIL PROTECTED]:~$ telnet newton 9100
Trying 192.168.0.8...
telnet: Unable to connect to remote host: Connection refused
[EMAIL PROTECTED]:~$
---------------------------------------------------------------------------------

4) Last attempt: I tried to run strace to monitor the system calls at
both systems and try to  figure out the differences.

I run the bconsole program on maculele (this server works fine), just to
see the console trying to connect to the director, and got the following
result (I'm showing only the final lines):

------------------------------------------------------------------------------------------------------------------
clone(child_stack=0x40ab1b48,
flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID|CLONE_DETACHED,
parent_tidptr=0x40ab1bf8, {entry_number:6, base_addr:0x40ab1bb0,
limit:1048575, seg_32bit:1, contents:0, read_exec_only:0,
limit_in_pages:1, seg_not_present:0, useable:1},
child_tidptr=0x40ab1bf8) = 12726
futex(0x806bb50, FUTEX_WAKE, 1)         = 1
time(NULL)                              = 1131378562
write(3, "\0\0\0\32", 4)                = 4
write(3, "Hello *UserAgent* calling\n", 26) = 26
read(3, "\0\0\0009", 4)                 = 4
read(3, "auth cram-md5 <1703362652.113137"..., 57) = 57
write(3, "\0\0\0\27", 4)                = 4
write(3, "P4+EI119mW+PsS85V//BuC\0", 23) = 23
select(4, [3], NULL, NULL, {180, 0})    = 1 (in [3], left {179, 960000})
read(3, "\0\0\0\r", 4)                  = 4
read(3, "1000 OK auth\n", 13)           = 13
gettimeofday({1131378562, 978543}, {120, 0}) = 0
gettimeofday({1131378562, 978705}, {120, 0}) = 0
gettimeofday({1131378562, 978863}, {120, 0}) = 0
gettimeofday({1131378562, 979020}, {120, 0}) = 0
gettimeofday({1131378562, 979156}, {120, 0}) = 0
uname({sys="Linux", node="maculele", ...}) = 0
time(NULL)                              = 1131378562
write(3, "\0\0\0005", 4)                = 4
write(3, "auth cram-md5 <1032677570.113137"..., 53) = 53
select(4, [3], NULL, NULL, {180, 0})    = 1 (in [3], left {179, 960000})
read(3, "\0\0\0\27", 4)                 = 4
read(3, "/61y86MgOH474FsNak+t2D\0", 23) = 23
write(3, "\0\0\0\r", 4)                 = 4
write(3, "1000 OK auth\n", 13)          = 13
read(3, "\0\0\0009", 4)                 = 4
read(3, "1000 OK: maculele-dir Version: 1"..., 57) = 57
time(NULL)                              = 1131378563
futex(0x806bb50, FUTEX_WAKE, 1)         = 1
futex(0x806bb40, FUTEX_WAKE, 1)         = 1
futex(0x808ec84, FUTEX_WAKE, 1)         = 1
write(1, "1000 OK: maculele-dir Version: 1"..., 571000 OK: maculele-dir
Version: 1.36.2 (28 February 2005)
) = 57
write(1, "Enter a period to cancel a comma"..., 36Enter a period to
cancel a command.
) = 36
open("/root/.bconsolerc", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such
file or directory)
ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig -icanon -echo
...}) = 0
write(1, "*", 1*)                        = 1
select(1, [0], NULL, NULL, {30, 0}
----------------------------------------------------------------------------------------------------------------

Then I did the same with newton (the server with the problem):

----------------------------------------------------------------------------------------------------------------
clone(child_stack=0x40abbb48,
flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID|CLONE_DETACHED,
parent_tidptr=0x40abbbf8, {entry_number:6, base_addr:0x40abbbb0,
limit:1048575, seg_32bit:1, contents:0, read_exec_only:0,
limit_in_pages:1, seg_not_present:0, useable:1},
child_tidptr=0x40abbbf8) = 12967
futex(0x806bb50, FUTEX_WAKE, 1)         = 1
time(NULL)                              = 1131383077
write(3, "\0\0\0\32", 4)                = 4
write(3, "Hello *UserAgent* calling\n", 26) = -1 EPIPE (Broken pipe)
--- SIGPIPE (Broken pipe) @ 0 (0) ---
time(NULL)                              = 1131383077
open("/etc/localtime", O_RDONLY)        = 4
fstat64(4, {st_mode=S_IFREG|0644, st_size=286, ...}) = 0
mmap2(NULL, 131072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0x40abc000
read(4,
"TZif\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\3\0\0\0\3\0"...,
131072) = 286
close(4)                                = 0
munmap(0x40abc000, 131072)              = 0
time([1131383077])                      = 1131383077
rt_sigaction(SIGPIPE, {0x40232a70, [], 0}, {SIG_IGN}, 8) = 0
socket(PF_FILE, SOCK_DGRAM, 0)          = 4
fcntl64(4, F_SETFD, FD_CLOEXEC)         = 0
connect(4, {sa_family=AF_FILE, path="/dev/log"}, 16) = 0
send(4, "<27>Nov  7 14:04:37 bacula-conso"..., 142, 0) = 142
rt_sigaction(SIGPIPE, {SIG_IGN}, NULL, 8) = 0
write(1, "07-Nov 14:04 bconsole:  Error: b"..., 11907-Nov 14:04
bconsole:  Error: bnet.c:406 Write error sending 26 bytes to Director
daemon:newton:9101: ERR=Broken pipe
) = 119
nanosleep({5, 0}, NULL)                 = 0
time(NULL)                              = 1131383082
futex(0x806bb50, FUTEX_WAKE, 1)         = 1
futex(0x806bb40, FUTEX_WAKE, 1)         = 1
futex(0x808ec84, FUTEX_WAKE, 1)         = 1
write(1, "Director authorization problem.\n"..., 156Director
authorization problem.
Most likely the passwords do not agree.
Please see http://www.bacula.org/html-manual/faq.html#AuthorizationErrors
for help.
) = 156
write(2, "ERR=", 4ERR=)                     = 4
ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig -icanon -echo
...}) = 0 ioctl(0, SNDCTL_TMR_START or TCSETS, {B38400 opost isig icanon
echo ...}) = 0
ioctl(0, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo
...}) = 0
munmap(0x402a8000, 4096)                = 0
exit_group(1)
----------------------------------------------------------------------------------------------------------------

I think the problem is here:
When works: "write(3, "Hello *UserAgent* calling\n", 26) = 26"
When doen't work: "write(3, "Hello *UserAgent* calling\n", 26) = -1
EPIPE (Broken pipe)"

And when the problem occurs, the password information isn't even sent!
I think the authentication happens at this line (in the first example):
"read(3, "auth cram-md5 <1703362652.113137"..., 57) = 57"
I can't see any thing like that on the second example.

There is any log that would be useful to look after the cause of this
problem?
Is it any problem with libraries that I don't even know the names?
Please, help me....

Tássia.


-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download
it for free - -and be entered to win a 42" plasma tv or your very own
Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to